Ben Welsh has a running list of the news organizations blocking OpenAI crawlers:
In total, 532 of 1,147 news publishers surveyed by the homepages.news archive have instructed OpenAI, Google AI or the non-profit Common Crawl to stop scanning their sites, which amounts to 46.4% of the sample.
The three organizations systematically crawl web sites to gather the information that fuels generative chatbots like OpenAI’s ChatGPT and Google’s Bard. Publishers can request that their content be excluded by opting out via the robots.txt convention.
On the web, it used to be that you would write or make something and there would be a link to the thing. Other websites could link to the thing, and people would go to the place with the thing. With this recent AI wave, a lot of the thing ends up elsewhere and no one sees the original place.
Fun times ahead.