Cloudflare, the internet infrastructure company responsible for routing about 20% of global web traffic, has announced it will begin blocking artificial intelligence (AI) crawlers by default.
The change, effective Tuesday, changes how AI companies will be allowed to access content hosted on the web after publishers pushed for more control and compensation for their data.
The content delivery network (CDN) helps websites cache and serve data closer to users. With this new policy, any new domain signing up for Cloudflare services will be prompted to decide when and if AI bots can access their content, or they can choose to block scrapers altogether.
The change adds to Cloudflare’s earlier initiatives to give publishers more control over their data. Last year, the company introduced a one-click solution to block all known AI bots and a dashboard to monitor crawler activity. Site owners use the tool to distinguish between crawlers scraping data for AI training, search purposes, or other uses.
Tuesday’s announcement formalizes those protections and enforces them by default. “AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate,” said Cloudflare CEO Matthew Prince in a statement released today.
According to company records, Cloudflare’s Pay per Crawl system, the foundation of this initiative, is a marketplace where AI companies and content owners can agree on compensation per access.
Both parties must have Cloudflare accounts, and once set up, they can negotiate prices and terms for web crawling activities. Cloudflare acts as a broker in the transaction, charging the AI company and passing the earnings to the publisher.
Several AI developers, including OpenAI, the Microsoft-backed artificial intelligence firm behind ChatGPT, have declined to participate in the program. In a recent public statement, the company lambasted Cloudflare for inserting a new intermediary between publishers and AI developers.
OpenAI mentioned it has a history of honoring the robots.txt protocol, a file that allows website operators to control crawler access, and insisted that it respects site preferences.
In a June analysis, Cloudflare claims to have found a gap between scraping frequency and traffic referrals. Google’s crawler, for example, accessed websites 14 times for every visit it sent back. In comparison, OpenAI’s bot scraped sites 17,000 times for every referral.
UK-based technology lawyer Matthew Holman told CNBC that AI crawlers can be intrusive and potentially harmful to user experience.
“They have been accused of overwhelming websites and significantly impacting user experience,” he said. Holman added that if Cloudflare’s system works as intended, it could nerf the ability of AI chatbots to collect and train on large-scale web data.
Major media companies are in support of Cloudflare’s efforts to reclaim control over digital content. Publishers, including TIME, The Associated Press, Conde Nast, The Atlantic, ADWEEK, and Fortune, have all agreed to block AI bots by default.
Media outlets have been accepting data scraping from platforms like Google in exchange for traffic and ad revenue. But the current AI-driven ecosystem has no such reciprocity. For many, AI platforms like ChatGPT and Claude consume content without meaningful engagement or revenue for original sources.
Cloudflare says it will continue to work with developers to push AI crawlers that wish to be allowed access to disclose their identity, purpose, and crawling behavior.
“Original content is what makes the Internet one of the greatest inventions in the last century,” CEO Matthew Prince stated. “We have to come together to protect it.”
Your crypto news deserves attention - KEY Difference Wire puts you on 250+ top sites