74% Of DeepSeek's Output Mimics OpenAI's ChatGPT, New Study Reveals: 'It Does Raise Questions About Its Development'

Comments
Loading...

AI Detection firm, Copyleaks has found that the text generated by DeepSeek-R1 and OpenAI’s ChatGPT has “Stylistic overlaps” in 74.2% of instances.

What Happened: A study conducted by Copyleaks showed that Chinese start-up firm, DeepSeek-R1’s text generation closely mirrored OpenAI’s style in over 74% of cases. This suggests that DeepSeek may have relied on OpenAI's model during its training without authorization, according to the report.

The study employed three AI classifiers to accurately analyze stylistic fingerprints, allowing for model-specific attribution. This plays a vital role in safeguarding intellectual property and fostering ethical AI development.

DeepSeek and OpenAI did not immediately respond to Benzinga’s request for comment.

Interestingly, the written text produced by most other models was easily identifiable as unique to each one, except DeepSeek. For instance, Microsoft‘s MSFT Phi-4 and Grok-1 models exhibited no similarities to existing models, confirming it was trained independently.

The majority of DeepSeek’s outputs were identified as being generated by OpenAI’s models.  “While this similarity doesn’t definitively prove or declare DeepSeek as a derivative, it does raise questions about its development,” said Shai Nisan, Head of data science at Copyleaks.

SEE ALSO: Eric Trump Says Timing Of Strategic Bitcoin, XRP, Cardano Reserve Announcement Was A ‘Win’ For Retail Investors

Why It Matters: This revelation comes at a time when the AI industry is under scrutiny. Recently, Singapore authorities launched an investigation into potential fraudulent activities related to the shipment of Nvidia Corp. NVDA chips.

Furthermore, Dario Amodei, CEO of Jeff Bezos-backed Anthropic, has called for stronger U.S. AI safeguards. He warned that autocratic regimes like China and Russia could use AI to enhance their control and military capabilities.

While the possibility of both AI models being trained using intersecting datasets cannot be ruled out, there could be deeper nuances. "Even if large language models draw from overlapping datasets, AI fingerprinting remains crucial. The sheer variety of elements—such as architecture, fine-tuning methods and generation techniques—ensures that each LLM develops a distinct writing style,” explains Nisan.

The DeepSeek-R1 and OpenAI similarity issue underscores the need for clear regulations and transparency in AI development to prevent potential misuse and protect intellectual property. It also brings into question DeepSeek R1’s perceived level of innovation.

Nisan cautioned that this could have significant implications for the AI industry, if not regulated.

Image Via Midjourney, Shutterstock

Market News and Data brought to you by Benzinga APIs

Posted In: