More Trouble For Google's AI? New York Times, CNN, And Others Block Data Access, Potentially Disrupting Tech Giant's Training Models

Several top websites have started blocking Alphabet Inc.’s Google GOOGL GOOG from using their content to train AI models. This move is seen as a response to the potential threat AI models pose to the traditional web traffic distribution system.

What Happened: The new tool, Google-Extended, allows website owners to prevent Google from using their content for AI model training. This tool has been adopted by approximately 10% of the top 1,000 websites, as per data from Originality.ai, reported Business Insider on Thursday.

Notably, The New York Times, which is currently embroiled in an AI copyright dispute with ChatGPT-parent OpenAI, has also implemented the Google-Extended blocker. The publication has also restricted OpenAI’s access to its content.

Other major websites that have activated the Google-Extended tool include CNN, BBC, Yelp, and Business Insider.

Despite this, the report noted that Google-Extended has seen less adoption than other AI training data blockers, such as OpenAI’s GPTBot, which is active on around 32% of the top 1,000 websites.

Jonathan Gillham, CEO of Originality.ai, explained that the lower adoption of Google-Extended could be due to the potential risk of websites not being included in AI-generated results if they block Google’s access to their data.

“If a query is ‘What is the best deep dish pizza in Chicago?’ and a Pizza shop excludes Google’s AI from using its website data to train on, then it will not have any knowledge of that restaurant and be unable to include it in its response,” Gillham explained.

Why It Matters: This resistance from top websites comes amid concerns about AI models’ impact on traditional web traffic distribution.

In December last year, The Atlantic expressed concerns about Google’s AI-enhanced search potentially answering user queries directly, thus reducing the need for users to visit external sites. This could lead to a significant decrease in traffic for publishers like The Atlantic, which relies heavily on Google for about 40% of its web traffic.

Meanwhile, the AI copyright dispute between The New York Times and OpenAI is part of a larger conflict in the AI industry. This conflict has recently escalated, with Elon Musk accusing OpenAI of stealing “everything” after doubts emerged about the company’s data sourcing for its AI model, Sora.

Photo: Courtesy Jonny Gios via Unsplash

Check out more of Benzinga's Consumer Tech coverage by following this link.

Subscribe to the Benzinga Tech Trends newsletter to get all the latest tech developments delivered to your inbox.

This content was partially produced with the help of Benzinga Neuro and was reviewed and published by Benzinga editors.

GOOGAlphabet Inc

$159.282.52%