Google's Alphabet And Amazon-Backed Anthropic Lead Effort To Redefine AI Evaluation Standards

Alphabet Inc (NASDAQ:GOOG), (NASDAQ:GOOGL) search arm Google and Amazon.com Inc (NASDAQ:AMZN)-backed Anthropic, a prominent AI startup, has announced a new initiative to fund the development of advanced AI benchmarks. The program aims to address the current inadequacies in AI benchmarking and provide a more comprehensive evaluation of AI models.

What Happened: Anthropic’s new program, revealed on Monday, will allocate funds to third-party organizations capable of creating benchmarks that can effectively evaluate the performance and impact of AI models, including generative models like Anthropic’s own Claude. Interested parties can submit their applications for evaluation on an ongoing basis.

Anthropic’s official blog post stated, “Our investment in these evaluations is intended to elevate the entire field of AI safety, providing valuable tools that benefit the whole ecosystem. Developing high-quality, safety-relevant evaluations remains challenging, and the demand is outpacing the supply.”

The current AI benchmarks are criticized for their inability to accurately represent how the average person uses the systems being tested. There are also concerns about the relevance of older benchmarks in measuring modern generative AI.

Anthropic’s solution is to create new, more challenging benchmarks that focus on AI security and societal implications. The company is calling for tests that evaluate a model’s ability to carry out tasks such as cyberattacks, weapon enhancement, and manipulation or deception of people through deepfakes or misinformation.

The company also aims to develop an “early warning system” for AI risks related to national security and defense. Anthropic’s program will also support research into benchmarks and “end-to-end” tasks that explore AI’s potential for aiding in scientific study, conversing in multiple languages, mitigating ingrained biases, and self-censoring toxicity.

Why It Matters: The launch of this funding program by Anthropic follows the recent unveiling of their most advanced AI model, Claude 3.5 Sonnet. This model, which outperforms its predecessor, demonstrates Anthropic’s commitment to pushing the boundaries of AI technology.

Moreover, Dario Amodei, CEO of Anthropic, has been vocal about the broader implications of AI on society. In an interview with Time Magazine in June, Amodei emphasized the need for a more comprehensive solution than Universal Basic Income to tackle AI-induced inequality. This reflects Anthropic’s focus on ensuring that AI advancements benefit the wider public.

Additionally, Anthropic has been involved in controversies, such as the alleged disregard for web scraping rules. This controversy highlights the ethical challenges that AI companies face in their quest for data to train their models.

Image Generated Using AI by MidJourney

This story was generated using Benzinga Neuro and edited by Kaustubh Bagalkote

Market News and Data brought to you by Benzinga APIs

Comments