Open Deep Search: Bringing Open-Source Search AI to the State-of-the-Art Frontier
Created on April 2|Last edited on April 2
Comment
Open Deep Search (ODS) is an open-source framework designed to close the performance gap between proprietary search AIs like Perplexity Sonar Reasoning Pro and OpenAI’s GPT-4o Search Preview, and open-source solutions. The key to this leap in capability is ODS’s two-part architecture: Open Search Tool and Open Reasoning Agent. By integrating real-time web search with powerful reasoning logic, ODS not only matches but often surpasses the performance of leading commercial alternatives on benchmarks such as SimpleQA and FRAMES.

Core Components: Open Search Tool and Open Reasoning Agent
ODS is built around a modular design that allows users to pair any base LLM with two main components. The Open Search Tool improves upon existing open-source SERP APIs by introducing advanced features like intelligent query rephrasing, relevance-based snippet filtering, and targeted data extraction from sites like ArXiv, PubMed, and Wikipedia. This ensures that the LLM is given rich, high-quality context to reason over.
The Open Reasoning Agent then takes over, orchestrating these tools to answer the user’s query through structured, multi-step reasoning. ODS offers two versions: ODS-v1, which uses the ReAct framework (reasoning and action with Chain-of-Thought prompting), and ODS-v2, which builds on CodeAct, using Python-based code execution for enhanced symbolic reasoning.
Performance on Benchmarks
On the SimpleQA benchmark—a collection of adversarial factuality questions—ODS-v2 combined with DeepSeek-R1 achieves 88.3% accuracy, outperforming nearly every closed-source model, including GPT-4o Search Preview. On the more complex FRAMES benchmark, which tests multi-hop reasoning and retrieval, ODS-v2 achieves 75.3%, exceeding GPT-4o by nearly 10%. These gains are not achieved through brute-force querying but by intelligently deciding when and how to search based on the model’s internal reasoning.

How ODS Outperforms
The success of ODS lies in its adaptive reasoning strategy and high-quality search augmentation. In contrast to closed systems like Perplexity or GPT-4o that rely on black-box APIs and fixed retrieval strategies, ODS dynamically rephrases queries and runs additional searches only when necessary. This approach is especially evident in cases from the FRAMES dataset, where ODS identifies ambiguous questions, reruns searches, and applies tools like Wolfram Alpha to perform calculations or unit conversions.
For example, when asked to find the age of a 1975 poetry prize winner in 2014, ODS-v1 identifies the person (Cid Corman), determines their birth year, and correctly calculates their age using a math API. In contrast, Perplexity’s model produces an incorrect estimate by failing to resolve key ambiguities in the query.
ReAct vs. CodeAct
The ReAct-based ODS-v1 uses structured reasoning prompts to combine thinking and acting in steps, allowing for flexible use of external tools. The agent uses a dynamic few-shot system to retrieve contextually relevant examples for reasoning, contributing to its strong performance even with relatively weaker base models like Llama3.1-70B.
ODS-v2, on the other hand, replaces structured action-output logic with Python-based code generation. By using CodeAct and SmolAgents, ODS-v2 takes advantage of the natural expressiveness and compositionality of code, leading to significant performance boosts on tasks requiring computation or symbolic manipulation. This design also allows for modular and distributed reasoning pipelines.
Ablation and Adaptivity
A detailed ablation study highlights the importance of each ODS component. Simply pairing a base model with the Open Search Tool gives a significant accuracy bump on SimpleQA but reduces performance on FRAMES due to a lack of multi-hop reasoning. Adding Chain-of-Thought and few-shot prompting to create the full ReAct-based agent (ODS-v1) restores and extends performance across both benchmarks. Finally, swapping in DeepSeek-R1 further boosts both accuracy and reasoning depth.
Adaptive search behavior is another key advantage. ODS-v2 uses more searches for FRAMES than SimpleQA, optimizing cost and speed without compromising accuracy. In contrast, existing proprietary models often run a fixed number of searches, regardless of the task complexity.
Closing the Open vs. Closed Gap
ODS is the first open-source framework that consistently matches or outperforms state-of-the-art proprietary systems on major search reasoning benchmarks. It achieves this without access to proprietary data or APIs, relying instead on smart agent design, flexible LLM pairing, and transparent tool integration. The authors publicly release ODS on GitHub, with the goal of catalyzing a new wave of open-source innovation in search AI.
Conclusion
Open Deep Search is a milestone for the open-source AI community. By combining high-quality search with agent-based reasoning in a plug-and-play framework, it demonstrates that open systems can rival and even surpass closed commercial offerings. With its modular design, support for any LLM, and transparent toolchain, ODS represents not just a technical achievement but a blueprint for the future of open AI development.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.