WildChat, a dataset of ChatGPT interactions

In case you need a large dataset to train your chatbot — and who doesn’t these days amirite — WildChat might prove helpful.

The WildChat Dataset is a corpus of 1 million real-world user-ChatGPT interactions, characterized by a wide range of languages and a diversity of user prompts. It was constructed by offering free access to ChatGPT and GPT-4 in exchange for consensual chat history collection. Using this dataset, we finetuned Meta’s Llama-2 and created WildLlama-7b-user-assistant, a chatbot which is able to predict both user prompts and assistant responses.

Beats ripping off Scarlett Johansson dialogue.