Reddit reportedly blocking data scraping from Google and other search crawlers
Reports have recently surfaced claiming that Reddit, the news aggregator and community site, is reportedly planning to block AI startups from scraping data from its website. Should the company go through with it, news crawlers such as what Google and Bing use, may end up affected.
The reports originate from a Washington Post report claiming that Reddit might remove the ability to log in to the site using Google credentials, as well as block the tech giant's web crawlers from scraping the site. The news post cited Reddit's recent struggles with reaching an agreement with AI companies, such as Google, to pay for the data they get off the site.
The reports originate from a Washington Post report claiming that Reddit might remove the ability to log in to the site using Google credentials, as well as block the tech giant's web crawlers from scraping the site. The news post cited Reddit's recent struggles with reaching an agreement with AI companies, such as Google, to pay for the data they get off the site.
This was later denied by Reddit, although not in its entirety, by only explicitly denouncing the Google login portion of the report. This left the second part, blocking web crawlers, up to interpretation.
What is happening with data scraping?
Recently, AI startups and the manner in which their chatbots are trained, has become a subject of controversy with news websites such as Reddit, X, etc. This has resulted in several news organizations having to block these attempts via API blocks and limits. X CEO, Elon Musk, has famously criticized AI startups for scraping his platform's data and blaming this issue for the recent API changes he implemented on the site.
Reddit had a similar issue a few months back, forcing the company to follow X's lead in blocking APIs, a move that caused a ton of controversy and prompted many sub-reddits to permanently shut down. However, the issue now seems to be that of the search crawlers, which continue to scrape the site for free.
AI startups have traditionally relied on publicly available web data to train their chatbots and other AI models. This allows them to avoid the costly and time-consuming process of creating their own datasets. However, news organizations and other content creators have increasingly expressed frustration with this practice, arguing that AI startups are profiting from their work without paying for it.
However, blocking search engine crawlers from accessing its website, would mean that Reddit content would no longer appear in Google and Bing search results. This would be a significant setback for Reddit, as search engines are a major source of traffic for the website.
Things that are NOT allowed: