Is OpenAI's Viral AI Video Generator Sora Trained On YouTube And Instagram? CTO Mira Murati Is 'Not Sure'

Leading AI startup OpenAI is facing accusations of violating data protection laws. The allegations stem from the lab’s use of public social media posts to train its new video generation model, Sora.

What Happened: In a recent interview with The Wall Street Journal's Joanna Stern, OpenAI’s CTO Mira Murati refused to answer whether the Microsoft Corp.-backed MSFT startup used public videos on platforms of rival companies like Alphabet Inc.'s GOOG GOOGL Google and Meta Platforms Inc. META.

"We used publicly available data and licensed data."

When asked if OpenAI used videos from YouTube, Facebook and Instagram, Murati said she is not "actually sure" and that she won't "go into the details." "If they were publicly available to use, there might be data [used]. But I'm not sure, I'm not confident about it."

This sparked speculation that OpenAI might have used publicly available videos on these platforms to feed the dataset to Sora AI.

Although she did not confirm the exact sources, she indicated that it might have been used if the data is publicly available.

These practices have raised concerns about potential data protection and privacy law violations, including GDPR and CCPA.

The issue is that even if a user deletes their social media post, their data may still be stored in Sora’s model weights and could potentially be extracted.

Currently, OpenAI does not have a mechanism to remove or prevent this data from being extracted.

The Italian regulator has found OpenAI to be in violation of data protection regulations.

Why It Matters: OpenAI’s Sora has had a tumultuous journey since its high-profile debut.

The model’s weaknesses were exposed shortly after its launch, with errors in video generation causing concern.

Despite these setbacks, OpenAI continued to refine the model, with Murati revealing that the company was still “red teaming” Sora to find and fix possible flaws before its public release.

OpenAI has previously faced legal action over its use of data. In December, the New York Times and a group of Pulitzer-winning authors sued OpenAI and Microsoft for copyright infringement, alleging unauthorized use of their content to train AI technologies.

The current accusations add to the growing list of concerns surrounding OpenAI’s data usage practices.