Manual removal of harmful text to train AI models

January 20, 2023

Topic

Statistics  /  , , , ,

AI training data comes from the internet, and as we know but maybe forget sometimes, there are harmful areas that are terrible for people. For Time, Billy Perrigo reports on how OpenAI outsourced a firm to label such data, which required people to read disturbing text:

To build that safety system, OpenAI took a leaf out of the playbook of social media companies like Facebook, who had already shown it was possible to build AIs that could detect toxic language like hate speech to help remove it from their platforms. The premise was simple: feed an AI with labeled examples of violence, hate speech, and sexual abuse, and that tool could learn to detect those forms of toxicity in the wild. That detector would be built into ChatGPT to check whether it was echoing the toxicity of its training data, and filter it out before it ever reached the user. It could also help scrub toxic text from the training datasets of future AI models.

To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet.