WildChat, a dataset of ChatGPT interactions

May 24, 2024

Topic

Data Sources / ChatGPT, WildChat

In case you need a large dataset to train your chatbot — and who doesn’t these days amirite — WildChat might prove helpful.

The WildChat Dataset is a corpus of 1 million real-world user-ChatGPT interactions, characterized by a wide range of languages and a diversity of user prompts. It was constructed by offering free access to ChatGPT and GPT-4 in exchange for consensual chat history collection. Using this dataset, we finetuned Meta’s Llama-2 and created WildLlama-7b-user-assistant, a chatbot which is able to predict both user prompts and assistant responses.

Beats ripping off Scarlett Johansson dialogue.

WildChat, a dataset of ChatGPT interactions

Topic

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)

WildChat, a dataset of ChatGPT interactions

Topic

Related

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)