Nvidia Reportedly Involved In Extensive Video Scraping, Involving Popular Tech YouTuber Marques Brownlee, Among Others: 'MKBHD Videos? Yeah Grab Those Too'

Leaked internal messages from NVIDIA Corp. NVDA have revealed that the company has been scraping videos from Alphabet Inc.’s GOOGL GOOG subsidiary YouTube and other sources to train its AI models. The videos include those of popular tech YouTuber Marques Brownlee, also known as MKBHD.

What Happened: Brownlee highlighted the leaked Nvidia messages, which discussed the company’s AI training practices. The messages revealed that Nvidia had been scraping videos from YouTube and other sources to train its AI models, including those of Brownlee. The revelation was first reported by 404 Media on Monday.

Brownlee wrote on X, “Now leaked NVIDIA slack messages discussing which YouTube channels to scrape videos from. MKBHD videos? Yeah grab those too.”

Brownlee referenced a post by Jason Koebler, a journalist with 404 Media, revealing that leaked Slack messages and documents disclose the massive extent of Nvidia’s AI data scraping—equivalent to 80 years, or “a human lifetime,” of videos daily.

cool cool cool cool cool cool now leaked NVIDIA slack messages discussing which YouTube channels to scrape videos from. MKBHD videos? Yeah grab those too. https://t.co/0XczvTNVBH
— Marques Brownlee (@MKBHD) August 5, 2024

Nvidia employees were reportedly instructed to scrape videos from sources such as Netflix Inc. and YouTube to train AI models for Nvidia’s Omniverse 3D world generator, self-driving car systems, and digital human products.

When questioned about the legal and ethical implications of using copyrighted content to train their AI model, Nvidia stated that their practices were "in full compliance with the letter and the spirit of copyright law," according to the report.

However, internal conversations viewed by 404 Media reveal that when Nvidia employees raised concerns about potential legal issues related to using datasets compiled by academics for research purposes and YouTube videos, managers assured them that they had received clearance from the highest levels of the company.

“We respect the rights of all content creators and are confident that our models and our research efforts are in full compliance with the letter and the spirit of copyright law. Copyright law protects particular expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from another source and use it to make their own expressions. Fair use also protects the ability to use a work for a transformative purpose, such as model training,” Nvidia told Benzinga.

Why It Matters: The issue of scraping content for AI training is not new. In July, Brownlee also voiced concerns about Apple Inc. using YouTube videos without creators’ consent. Brownlee noted that this problem is likely to persist.

Moreover, in June, OpenAI and Anthropic were reported to be ignoring web scraping rules, stirring controversy. They bypassed the robots.txt protocol, which is designed to prevent automated scraping of websites.

In September, Elon Musk blamed AI scraping for the implementation of tweet paywalls on X, Inc., formerly Twitter Inc. Users now need an account to read tweets, and those wanting to view more than 600 posts per day must pay for Twitter Blue access.

Additionally, in June, Reddit Inc. announced a policy update to block automated content scraping, leading to a nearly 9% surge in its stock value. The company emphasized enforcing its Public Content Policy to prevent unauthorized scraping.

Read Next: