Skip to main content

Anthropic Launches New Initiative for Third-Party AI Model Evaluations

Anthropic looks to fund the future of AI evals!
Created on July 2|Last edited on July 2
Anthropic has introduced a new initiative aimed at funding third-party organizations to develop comprehensive evaluations for advanced AI models. This initiative seeks to enhance the assessment of AI capabilities and risks, addressing the current limitations in the evaluation landscape. The primary goal is to elevate AI safety and provide valuable tools for the entire ecosystem.
Anthropic's focus is on three main areas: AI safety level assessments, advanced capability and safety metrics, and infrastructure for developing evaluations.

Cybersecurity

In AI safety level assessments, the initiative targets critical areas such as cybersecurity, where evaluations will measure models' capabilities in cyber operations, including vulnerability discovery and exploit development. Another priority is assessing models' potential to enhance or create chemical, biological, radiological, and nuclear threats. Evaluations will also measure models' autonomous capabilities, focusing on AI research and development, advanced behaviors, and resource acquisition. Additionally, the initiative will develop an early warning system to identify and assess national security risks, along with evaluations measuring models' amplification of persuasion-related threats like disinformation and manipulation.

Graduate-level Evals

For advanced capability and safety metrics, the initiative aims to develop evaluations that challenge AI models with graduate-level knowledge synthesis, hypothesis generation, and autonomous research project execution. Enhancing evaluations of models' abilities to detect harmful outputs, such as dual-use information and automated cyber incidents, is also a priority. Furthermore, the initiative will create capability benchmarks across multiple languages and develop evaluations that assess biases, discrimination, psychological influence, economic impacts, and other broad societal effects.

No-code Solutions

To support the development of these evaluations, Anthropic is funding platforms that enable non-coders to create robust evaluations. It will also develop diverse datasets to improve models' abilities to score outputs from other models and run large-scale trials to measure a model's impact through controlled comparisons.

The Challenge

Anthropic emphasizes that effective evaluations should be sufficiently difficult, unseen in the training data, efficient, scalable, and well-documented. They should also be developed or reviewed by domain experts, diverse in format, comparable to expert baselines, iterative, scalable, and realistic in threat modeling.
Interested parties can submit proposals via Anthropic’s application form. Selected proposals will receive funding and guidance from Anthropic’s experts, including the Frontier Red Team and Trust & Safety teams, to refine and enhance their evaluations. This initiative aims to set a new standard for comprehensive AI evaluation, inviting participation to shape the future of AI safety.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.