Skip to main content

How to build scalable research agents with SambaNova Cloud and Weave

Learn how to use W&B Weave and SambaNova Cloud to build research agents in this interactive tutorial
Created on April 22|Last edited on April 22
SambaNova Cloud offers pre-trained foundation models as a service, running on custom hardware optimized for high-performance inference. At the core is its Reconfigurable Dataflow Unit (RDU) architecture, which enables fast token throughput and is well-suited for large models used in research, reasoning, and complex problem solving.
In this blog, we’ll walk through how to build agents using SambaNova Cloud and track their behavior and performance with W&B Weave. Whether you're experimenting or scaling up, this setup is designed to give you visibility across your entire agentic workflow.
Let's get started.

Agent design

To handle a diverse set of user queries, we structured our system around four specialized agents, each trained to handle complex queries across a variety of domains. We'll be focusing on the research agent in this piece, but we've also built a general assistant agent, sales leads agent, and financial analysis agent in our system.
These agents are coordinated using a central planner that delegates tasks across our suite of agents. Sambanova Cloud serves as the backbone of this application, allowing our agents to handle tens of thousands of tokens and enabling lightning-fast responses with high precision. We support multimodal inputs with this application, allowing you to interact with the agent via text, voice, or document upload.
As mentioned, in this blog, we will focus on the deep research abilities of this agent that allow you to query multiple online sources using APIs from Yahoo! and Google Finance for company information and Tavily and Exa for search.
Our agent is designed to make researching companies and compiling detailed reports quick and effortless.
The agent lifecycle follows a streamlined and modular process:
  1. User query processing
  2. Agent assignment
  3. Data retrieval and processing
  4. Response generation
Our entire agent workflow is tracked using Weave from Weights & Biases. Learn how it logs and visualizes agent steps here.
💡
Here’s a high-level architecture of the agent workflow:


Let's build it

We first need to clone the repository with our agent. You can find that here.
gh repo clone wandb/sambanova-webinar
Just follow the instructions in the repo to get your agent set up. Once you have completed the setup, just sign in to your application and provide your SambaNova API Key, which you can get here.
Accessing models served by SambaNova is straightforward, thanks to their API client’s compatibility with OpenAI’s client conventions. This makes integration seamless, allowing you to incorporate SambaNova-hosted models into your applications with minimal changes.
Here’s how to get started:
from openai import OpenAI

client = OpenAI(
base_url="https://api.sambanova.ai/v1",
api_key="YOUR SAMBANOVA CLOUD API KEY"
)
You can read more about their compatibility with OpenAI client libraries here.
Now that your agent is up and running you, can query the system with things like "research Meta and give me a financial analysis report and market risk overview for 2025."

Tracing the agent flow with Weave

Once the agents are running, tracking how each one operates is crucial. Weave makes this easy by providing full visibility into the agent lifecycle. To track the flow of these crews, we introduce the @weave.op() decorator for the agents we want to track.
W&B Weave is a comprehensive toolkit designed to streamline the development and monitoring of GenAI applications. It enables developers to log, visualize, and debug every aspect of their AI workflows, from initial prompts to final outputs. This gives you deep insights into how your agents are performing and where improvements can be made—whether it’s refining prompt structure, changing model parameters, or identifying failure modes.
We use Weave to track this agent's flow as it is initiated to get the full context of all subsequent calls made to the agent to complete our request, including API calls to other services and our foundation models.
import weave
@weave.op(name='initialize_agent')
async def initialize_agent_runtime(
redis_client: SecureRedisService,
api_keys: APIKeys,
user_id: str,
conversation_id: str,
websocket_manager: WebSocketInterface
) -> SingleThreadedAgentRuntime:
"""
Initializes the agent runtime with the required agents and tools.

Returns:
SingleThreadedAgentRuntime: The initialized runtime for managing agents.
"""
In our codebase (backend/agent/crewai_llm.py), we wrap LLM calls, planner logic, and key agent functions with @weave.op() to automatically log inputs, outputs, latency, and even token usage. These traces are visualized in Weave’s UI, helping you debug failed tasks (e.g., missing API keys, model errors), monitor performance bottlenecks across agents, and compare execution paths and agent outputs over time.
import weave
class CustomLLM(LLM):
@weave.op()
def call(
self,
messages: Union[str, List[Dict[str, str]]],
tools: Optional[List[dict]] = None,
callbacks: Optional[List[Any]] = None,
available_functions: Optional[Dict[str, Any]] = None,
) -> str:
"""
High-level llm call method that:
1) Accepts either a string or a list of messages
2) Converts string input to the required message format
3) Calls litellm.completion
4) Handles function/tool calls if any
5) Returns the final text response or tool result
"""

Once we have Weave fully integrated into all the functions we want to track, we get the full trace view of the agent execution and handling of our tasks.
For example, when a financial analysis agent fails to return data due to a broken API call, Weave helps you trace exactly where the failure happened and what input caused it by quickly showing you where the trace tree and execution timeline.

We can even track the execution of specific agents to understand how each agent performs:

By adding the @weave.op() decorator to key functions—like our LLM calls, planner logic, and external API interactions- we gain full visibility into each step of the agent's execution. That includes prompt inputs and outputs, function arguments, response latency, token usage, and much more. Weave traces are very detailed and capture details like agent code to help your team debug, iterate, and improve your AI agents over time.

Conclusion

Building, evaluating, and optimizing intelligent AI agents is complex, but it doesn't have to feel like a black box. With SambaNova Cloud, you get the performance and scalability needed to run large foundation models that power these agents, and with W&B Weave, you gain full visibility into how those agents operate—step by step.
In this blog, we explored how to design a modular agent architecture, integrate it with models served through SambaNova’s high-throughput RDUs, and trace each interaction using Weave. Whether you're building tools for deep research, finance, or other domains, this stack gives you the infrastructure to move fast without losing insight.
Now that you've seen how it works, try it out for yourself!
Join the Weights & Biases and SambaNova teams for a live webinar on April 30th at 8 AM PT, where we’ll build and run this agent step-by-step—live. Don’t miss the chance to learn directly from the teams behind the tools. You can sign up by clicking the button below.
SIGN UP FOR OUR WEBINAR

Iterate on AI agents and models faster. Try Weights & Biases today.