Skip to main content

Debugging CrewAI multi-agent applications

Build and debug AI agents faster with CrewAI and W&B Weave. Monitor, analyze, and optimize every step of your multi-agent workflows.
Created on July 2|Last edited on July 3
Debugging multi-agent AI applications is a significant challenge. As agent-based workflows are used to solve more complex and realistic tasks, developers face increasing pressure to observe, understand, and optimize the operation of each agent. CrewAI provides a sophisticated framework for building and coordinating teams of specialized AI agents. On top of that, W&B Weave brings much-needed transparency to these workflows by letting you monitor, analyze, and replay every step of each agent’s decision-making process.
In this article, you will see how combining CrewAI and Weave equips you with a way to not only build agentic systems but also to continually debug and enhance them. As a practical example, we will explore a log-analysis agent that reads through error logs, uncovers root causes, creates effective search queries, scans both GitHub issues and the wider web for potential solutions, examines the affected code files, and finally generates a human-friendly HTML debug report.
This process happens automatically, and every agent step is fully visible thanks to Weave. Whether you are new to multi-agent LLM systems or seeking to improve your debugging strategies, this guide will show you how to iterate faster with greater confidence.
If you're the impatient type (or just want to dive straight into the dang code), you can:
Jump to the tutorial




Table of contents



Understanding CrewAI and its multi-agent capabilities

As AI automation continues to evolve, it's becoming clear that many real-world problems are too complex for a single, monolithic model to handle effectively. The most robust results often emerge when multiple specialized AI agents collaborate, each focusing on a distinct part of the workflow. CrewAI is designed to make this agent-based approach accessible and practical for anyone building with language models.
At its heart, CrewAI is based on the principle that complex challenges are more manageable when broken down into smaller parts, with dedicated agents responsible for each one. This concept mirrors how human teams assign roles and distribute responsibilities. Rather than pouring every requirement into one oversized prompt, CrewAI allows developers to define specific roles for each agent, outline their tasks, and coordinate their collaboration within a larger workflow.
Each AI agent can be independently inspected, tested, and improved. As a result, the whole system is easier to maintain and adapt when requirements evolve. Switching from a single-model approach to a structured, agent-based setup helps developers build AI solutions that are more reliable, transparent, and extensible.
In short, CrewAI provides a framework for orchestrating multiple AI specialist agents, each bringing their own expertise to the table and collaborating to solve complex tasks in a clear and manageable manner.

Key agentic features of CrewAI

CrewAI offers a practical set of features that enable effective management of multi-agent AI workflows. At its foundation, CrewAI allows you to define agents with specific roles, clear boundaries, and assigned tools. Each agent is configured to focus on a distinct part of your workflow, ensuring clarity and efficiency across the system.
Task management is another core capability. CrewAI enables you to assign tasks to agents, specify their dependencies, and control the flow of information through the workflow. Workflows can be set to run in sequence or in parallel, providing flexibility for both simple and complex use cases. This structure supports seamless data exchange and collaboration among agents.
Agents can access a variety of external tools and APIs, enabling them to integrate and process information that extends well beyond their built-in language model capabilities. This extensibility makes agents adaptable and capable of handling real-world, evolving requirements. CrewAI supports two methods for configuring agents and workflows. You can define everything directly in Python code for quick experimentation and easier debugging. Alternatively, you can use YAML configuration files, which are especially useful for managing larger projects where agent definitions may need to be updated frequently or reused.
With its focus on modularity, robust task coordination, flexible integrations, and a choice of configuration methods, CrewAI provides a solid foundation for building transparent, maintainable, and scalable AI agent systems.

Introducing W&B Weave for enhanced LLM observability

W&B Weave provides a powerful layer of observability for multi-agent systems, directly integrated with CrewAI for a seamless user experience. With just a simple import and initialization of Weave in your project, every step of your agents’ activity is automatically captured and logged.
As we'll see below, this comprehensive observability enables you to visualize each step of an agent’s process, track decision paths, and monitor the flow of information between agents in real-time. Weave lets you review actions taken by agents, review detailed logs, and pinpoint exactly where and why certain decisions were made within your workflow. This level of visibility is crucial for debugging complex pipelines, diagnosing issues, and making sure your system delivers reliable results.
With Weave integrated with CrewAI, developers gain immediate insight into the inner workings of their agent teams. This clear view enables faster iteration, more confident deployment, and a smoother path to continuous improvement. Instead of treating observability as a challenge, Weave turns it into a strength. You will be able to not only build smarter and more effective agents but also deeply understand and optimize how they operate at every stage of your workflow.

Monitoring cost, token usage, and agent failures

W&B Weave gives developers detailed tools to monitor essential operational metrics for multi-agent systems, including costs, token usage, latency, and agent failures. These metrics are displayed in clear dashboards, making it easy to track both overall system health and the performance of individual agents.
With Weave, you can see the exact costs and token usage breakdown for each model your agents use. This transparency helps you pinpoint which models or processes are driving expenses, allowing you to make more informed choices about when to use large, powerful models and when a more cost-effective option might suffice. By closely monitoring token usage per agent and per model, you can prevent unexpected spikes and fine-tune workflows to strike a balance between quality and affordability.

Latency tracking allows you to detect bottlenecks and optimize response times across your agentic pipeline. You can quickly spot which agent or model is slowing things down, enabling faster debugging and system tuning. Additionally, Weave tracks agent failures and errors at every workflow step.
If an agent encounters an issue or a model request fails, Weave logs the event with detailed context. This makes it straightforward to identify, diagnose, and fix problems without manually digging through logs. Monitoring these metrics is crucial for enhancing agent system performance and maintaining operational costs within control. With W&B Weave, you have a clear view into every aspect of your agentic system's resource usage and reliability. This enables proactive optimization, more predictable costs, and a much smoother path to scaling your AI workflows with confidence.


Tutorial: Building a code debugger agent with CrewAI

To illustrate, in this tutorial, we will develop a code debugger agent specifically designed for Python. Debugging Python applications often requires searching both the web and GitHub to fully understand the nature of an error. This is because error messages in Python, known as stderr (short for "standard error"), typically include a traceback and other diagnostic details, but may not provide enough context or immediate solutions.
Stderr is the output stream where Python sends its error messages when an error occurs in your code. These messages help identify the file and line number involved, but developers still have to research what the error actually means and how others have resolved it.
Our CrewAI-based agent will automatically analyze Python stderr output, identify the root cause of the error, and generate targeted queries to find relevant information across sites such as Stack Overflow, official documentation, and GitHub repositories. The agent will then consolidate its findings and produce a detailed debugging report.
Along with showcasing how to build and structure such an agent using CrewAI, this tutorial will demonstrate how to use W&B Weave for observing every step in the process, from resource usage to tracing the agent’s reasoning. This approach will help you streamline your debugging workflow while keeping everything transparent and easy to optimize.

Step 1: Creating a logging system using a bash alias

To start building our Python code debugger agent, we first need a way to consistently capture error messages from Python scripts. This makes it much easier for our agent to analyze what went wrong.
In this step, we create a Bash function, which you can save in your shell profile (such as .bashrc or .zshrc) and use in place of the typical python command. Here’s how it works:
agentpython() {
logfile="/tmp/agentpython-stderr.log"
python "$@" 2> >(tee "$logfile" >&2)
if [[ -s "$logfile" ]]; then
# If logfile is NOT empty, run check script
python /Users/brettyoung/Desktop/dev25/tutorials/dbg_crw/debug_main.py "$logfile"
else
# If logfile is empty, clear it (truncate to zero length)
> "$logfile"
fi
}
To add this, you can use the following command, which will add this alias to your configuration file. Note, you will need to replace full_path_to_your_script with the full path to a Python script on your system before running this command (which we will create in step 3):
profile_file=$(test -f ~/.zshrc && echo ~/.zshrc || (test -f ~/.bashrc && echo ~/.bashrc)); echo 'agentpython() {
logfile="/tmp/agentpython-stderr.log"
python "$@" 2> >(tee "$logfile" >&2)
if [[ -s "$logfile" ]]; then
python full_path_to_your_script "$logfile"
else
> "$logfile"
fi
}' >> "$profile_file" && source "$profile_file" && echo "Added and sourced $profile_file"
How this works:
  • When you run agentpython myscript.py, the function executes your Python script.
  • Any errors or tracebacks that would normally print to the terminal (stderr) are also written to a log file at /tmp/agentpython-stderr.log.
  • If the script runs without errors, the log file gets cleared.
  • If errors occur, the log file remains populated, and the function then automatically passes this log to your debugging agent script (debug_main.py), which analyzes the error output.
This logging system ensures all stderr output from your Python run is captured and ready for immediate analysis, laying the groundwork for the rest of your debugging workflow.

Step 2: Creating a "buggy" script to test with

Now that you have a logging system for capturing Python errors, the next step is to create a Python script that reliably causes an error. This will be the test script your debugging agent will analyze.
Here is an example script that uses NumPy and intentionally corrupts a buffer. This is likely to trigger a confusing or severe error, such as a NumPy buffer or segmentation fault:
import numpy as np

# Create a structured array
dt = np.dtype([('x', 'f8'), ('y', 'i4')])
arr = np.zeros(100, dtype=dt)

# Fill with data
arr['x'] = np.random.random(100)
arr['y'] = np.arange(100)

# Create problematic buffer view
buffer_data = arr.tobytes()[:-5] # Truncated buffer

# This triggers a numpy buffer/memory bug
corrupted = np.frombuffer(buffer_data, dtype=np.complex128, count=-1)

# Try to use the corrupted array - this often segfaults
result = np.fft.fft(corrupted) * np.ones(len(corrupted))
print(f"Result shape: {result.shape}")
Save this file as bad_code.py (or any name you like). When you run it with your new logging command (for example, agentpython buggy_script.py), any error output will be saved by your Bash logging system. This gives your CrewAI-based agent a real-world scenario to analyze in the next steps. By starting with a test script that always causes an error, you make it easier to test and refine your Python debugging automation.

Step 3: Building out our agent with CrewAI and Weave

Now we’ll build our debugging agent using CrewAI and Weave. The process begins by reading the Python stderr log file you set up in Step 1. Once the agent loads this log, it analyzes the error, identifies the specific files and lines involved, and generates a search query based on the identified problem. This search query is then automatically used to look for solutions both on the web and GitHub. Relevant code snippets are identified from your project to provide deeper insight, and the agent ultimately generates a detailed debugging report with links and recommended solutions.
To implement this workflow, we first import all necessary libraries and tools, then initialize Weave for logging and tracking. Reading the log content occurs immediately, allowing all agents to access this information. We then define a series of agents using CrewAI, each focused on a part of the process: one for log analysis, one for crafting search queries, another for searching code repositories and the web, and others for reviewing code snippets and assembling the report. Each task feeds its output into the next, allowing results from file analysis, web searches, and code reviews to combine into a complete and practical debugging session. Here is the code:
import os
import sys
import re
import requests
import tempfile
import webbrowser
import html
from pathlib import Path
from typing import Type, List, Optional, Dict, Any, Union
import json

from pydantic import BaseModel, Field

from crewai import Agent, Task, Crew, Process
from crewai.tools import BaseTool
from crewai import BaseLLM, LLM
import weave; weave.init("crewai_debug_agent")

from langchain_openai import ChatOpenAI
import os
import re
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Type
import subprocess


LOGFILE = sys.argv[1] if len(sys.argv) > 1 else "/tmp/agentpython-stderr.log"
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')

# Read the log file BEFORE kicking off
LOG_CONTENT = ""
if os.path.exists(LOGFILE) and os.path.getsize(LOGFILE) > 0:
with open(LOGFILE, 'r') as f:
LOG_CONTENT = f.read()
print(f"\033[95m[LOG CONTENT LOADED] {len(LOG_CONTENT)} characters from {LOGFILE}\033[0m")
else:
LOG_CONTENT = "No log file found or file is empty"
print(f"\033[95m[LOG] No content found in {LOGFILE}\033[0m")

def verbose_print(msg):
print(f"\033[95m[LOG] {msg}\033[0m", flush=True)



# ----- Tool Input Schemas -----
class LogAnalysisInput(BaseModel):
log_content: str = Field(..., description="Log content to analyze (already loaded)")

class SearchQueryInput(BaseModel):
error_text: str = Field(..., description="Error text to generate search query from")

class CombinedSearchInput(BaseModel):
query: str = Field(..., description="Search query for both GitHub issues and web")
owner: str = Field(default="", description="GitHub repository owner")
repo: str = Field(default="", description="GitHub repository name")

class FileAnalysisInput(BaseModel):
log_content: str = Field(..., description="Log content to extract file information from")

class FileSnippetInput(BaseModel):
file_path: str = Field(..., description="Path to the file to get snippet from")
line: Optional[int] = Field(default=None, description="Line number to focus on")
n_lines: int = Field(default=20, description="Number of lines to return")

class ToolSuggestionInput(BaseModel):
error_message: str = Field(..., description="Error message to analyze")
code_snippet: str = Field(..., description="Code snippet related to the error")

class ReportGenerationInput(BaseModel):
log: str = Field(..., description="Error log content")
file_snippet: str = Field(default="", description="Relevant code snippet")
tools: str = Field(default="", description="Tool recommendations")
gh_results: str = Field(default="", description="GitHub search results")
web_results: str = Field(default="", description="Web search results")

# ----- Tools -----

class LogReaderTool(BaseTool):
name: str = Field(default="Log Reader")
description: str = Field(default="Provides access to the pre-loaded log content")
args_schema: Type[BaseModel] = LogAnalysisInput
def _run(self, log_content: str = None) -> str:
verbose_print(f"Using pre-loaded log content")
if not LOG_CONTENT or LOG_CONTENT == "No log file found or file is empty":
return "[LOG] Log file empty or not found. No action needed."
is_python_error = "Traceback" in LOG_CONTENT or "Exception" in LOG_CONTENT or "Error" in LOG_CONTENT
error_type = "Python Error" if is_python_error else "General Error"
return f"Error Type: {error_type}\n\nLog Content:\n{LOG_CONTENT}"

class SearchQueryGeneratorTool(BaseTool):
name: str = Field(default="Search Query Generator")
description: str = Field(default="Generates optimized search queries from error messages")
args_schema: Type[BaseModel] = SearchQueryInput
def _run(self, error_text: str) -> str:
verbose_print("Generating search query via LLM...")
try:
prompt = (
"Given this error or question, write a concise search query to help the person find a solution online. "
"Output only the query (no explanation):\n\n" + error_text
)
query = llm.call(prompt)
return f"Generated search query: {query.strip()}"
except Exception as e:
return f"Error generating search query: {str(e)}"




class CombinedSearchTool(BaseTool):
name: str = Field(default="Combined GitHub & Web Search")
description: str = Field(default="Searches both GitHub issues and the web in one call, returning both results.")
args_schema: Type[BaseModel] = CombinedSearchInput

def _run(self, query: str, owner: str = "", repo: str = "") -> dict:
github_results = self._github_search(query, owner, repo)
web_results = self._web_search(query)
return {
"github_issues": github_results,
"web_search": web_results
}

def _github_search(self, query: str, owner: str, repo: str):
import httpx
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
url = 'https://api.github.com/search/issues'
headers = {'Accept': 'application/vnd.github.v3+json'}
if GITHUB_TOKEN:
headers['Authorization'] = f'token {GITHUB_TOKEN}'
gh_query = f'repo:{owner}/{repo} is:issue {query}' if owner and repo else query
params = {'q': gh_query, 'per_page': 5}
try:
with httpx.Client(timeout=15) as client:
resp = client.get(url, headers=headers, params=params)
if resp.status_code == 200:
items = resp.json().get("items", [])
return [
{
"number": item.get("number"),
"title": item.get("title"),
"url": item.get("html_url"),
"body": (item.get("body") or "")[:500]
}
for item in items
]
else:
return [{"error": f"GitHub search failed: {resp.status_code} {resp.text}"}]
except Exception as e:
return [{"error": f"Error searching GitHub: {str(e)}"}]

def _extract_json(self, text):
m = re.search(r"```json\s*(.*?)\s*```", text, re.DOTALL)
if not m:
m = re.search(r"```(.*?)```", text, re.DOTALL)
block = m.group(1) if m else text
try:
j = json.loads(block)
return j if isinstance(j, list) else [j]
except Exception:
return []
def _web_search(self, query: str, n_results: int = 5):
# Your actual OpenAI-based tool call here
from openai import OpenAI # or however your actual OpenAI client is imported
client = OpenAI()
prompt = (
f"Show me {n_results} of the most important/useful web results for this search along with a summary of the problem and proposed solution: '{query}'. "
"Return as markdown JSON:\n"
"[{\"title\": ..., \"url\": ..., \"date_published\": ..., \"snippet\": ...}]"
)
response = client.responses.create(
model="gpt-4.1", # or "gpt-4.1", or your available web-enabled model
tools=[{"type": "web_search_preview"}],
input=prompt,
)
return self._extract_json(response.output_text)




class FileAnalysisTool(BaseTool):
name: str = Field(default="File Analysis")
description: str = Field(default="Extracts file paths and line numbers from error logs")
args_schema: Type[BaseModel] = FileAnalysisInput
def _run(self, log_content: str = None) -> str:
verbose_print("Invoking LLM to identify files from log...")
# Use the global LOG_CONTENT if log_content not provided
content_to_analyze = log_content or LOG_CONTENT
try:
prompt = (
"Given this error message or traceback, list all file paths (and, if available, line numbers) involved in the error. "
"Output one JSON per line, as:\n"
'{"file": "path/to/file.py", "line": 123}\n'
'If line is not found, use null.\n'
f"\nError:\n{content_to_analyze}"
)
output = llm.call(prompt)
results = []
for l in output.splitlines():
l = l.strip()
if not l:
continue
try:
results.append(eval(l, {"null": None}))
except Exception as exc:
verbose_print(f"[File Extraction Skipped Line]: {l!r} ({exc})")
return f"Files found in error: {results}"
except Exception as e:
return f"Error analyzing files: {str(e)}"

class FileSnippetTool(BaseTool):
name: str = Field(default="File Snippet Extractor")
description: str = Field(default="Extracts code snippets from files around specific lines")
args_schema: Type[BaseModel] = FileSnippetInput
def _run(self, file_path: str, line: Optional[int] = None, n_lines: int = 20) -> str:
if not os.path.exists(file_path):
return f"File not found: {file_path}"
try:
with open(file_path, "r") as f:
lines = f.readlines()
if line and 1 <= line <= len(lines):
s = max(0, line-6)
e = min(len(lines), line+5)
code = lines[s:e]
else:
code = lines[:n_lines]
return f"Code snippet from {file_path}:\n{''.join(code)}"
except Exception as e:
return f"Error reading file {file_path}: {str(e)}"

class ToolSuggestionTool(BaseTool):
name: str = Field(default="Tool Suggestion")
description: str = Field(default="Suggests which debugging tools to use next based on error analysis")
args_schema: Type[BaseModel] = ToolSuggestionInput
def _run(self, error_message: str, code_snippet: str) -> str:
verbose_print("Requesting tool suggestions via LLM...")
prompt = (
"You are an AI debugging orchestrator. The following is a Python error message and a snippet of code "
"from a file involved in the error. Based on this, choose which tools should be used next, and explain why. "
"Possible tools: github_issue_search, web_search. "
"Always recommend github_issue_search as it's very helpful. "
"Provide your recommendation in a clear, structured format.\n"
"Error:\n" + error_message + "\n\nFile snippet:\n" + code_snippet
)
try:
return llm.call(prompt).strip()
except Exception as e:
return f"Error generating tool suggestions: {str(e)}"

class ReportGeneratorTool(BaseTool):
name: str = Field(default="HTML Report Generator")
description: str = Field(default="Generates HTML debug reports")
args_schema: Type[BaseModel] = ReportGenerationInput
def _run(self, log: str, file_snippet: str = "", tools: str = "", gh_results: str = "", web_results: str = "") -> str:
verbose_print("Writing HTML report ...")
out_path = os.path.join(tempfile.gettempdir(), 'dbg_report.html')
try:
with open(out_path, "w", encoding="utf-8") as f:
f.write("<html><head><meta charset='utf-8'><title>Debug Results</title></head><body>\n")
f.write("<h1 style='color:#444;'>Debugging Session Report</h1>\n")
f.write("<h2>Error Log</h2>")
f.write("<pre style='background:#f3f3f3;padding:8px;'>" + html.escape(log or "None") + "</pre>")
if file_snippet:
f.write("<h2>Relevant Source Snippet</h2><pre style='background:#fafaff;padding:8px;'>" + html.escape(file_snippet) + "</pre>")
if tools:
f.write("<h2>LLM Tool Recommendations</h2><pre style='background:#eef;'>" + html.escape(tools) + "</pre>")
if gh_results:
f.write("<h2>GitHub & Web Search Results</h2><pre>" + html.escape(gh_results) + "</pre>")
if web_results:
f.write("<h2>Web Search AI Answer</h2><pre>" + html.escape(web_results) + "</pre>")
f.write("</body></html>")
return f"HTML report generated and opened at: {out_path}"
except Exception as e:
return f"Error generating HTML report: {str(e)}"

# --- Tool Instances
log_reader_tool = LogReaderTool()
search_query_generator_tool = SearchQueryGeneratorTool()
combined_search_tool = CombinedSearchTool()
file_analysis_tool = FileAnalysisTool()
file_snippet_tool = FileSnippetTool()
tool_suggestion_tool = ToolSuggestionTool()
report_generator_tool = ReportGeneratorTool()


class CustomChatOpenAI(ChatOpenAI):
def call(self, prompt, system_message=None):
"""
Run inference on a prompt (string). Optionally provide a system message.
Args:
prompt (str): The user's message.
system_message (str, optional): The system context for the assistant.
Returns:
str: The model's response content.
"""
messages = []
if system_message:
messages.append(("system", system_message))
messages.append(("human", prompt))
result = self.invoke(messages)
return result.content


llm = CustomChatOpenAI(model_name="gpt-4o-mini", temperature=0.3)
fouro_llm = CustomChatOpenAI(model_name="gpt-4o-mini", temperature=0.3)



# --- Agents ---
log_analyst_agent = Agent(
role="Log Analysis Specialist",
goal="Analyze the pre-loaded log content to identify errors and extract relevant information. Start by reading log with the log_reader_tool, then move on to usign the file_alaysis_tool to read important info from the file(s) involved in the error",
backstory="Expert in parsing error logs and identifying the root causes of issues",
tools=[log_reader_tool, file_analysis_tool],
allow_delegation=False,
llm=fouro_llm
)

search_specialist_agent = Agent(
role="Search Query Specialist",
goal="Generate optimized search queries from error messages for effective problem resolution. The search must be less that 100 chars long!!!!!!!!!!",
backstory="Expert in crafting search queries that yield the most relevant debugging results. The search must be less that 100 chars long!!!!!!!!!!",
tools=[search_query_generator_tool],
allow_delegation=False,
llm=fouro_llm
)


combined_research_agent = Agent(
role="Combined Repository and Web Search Specialist",
goal="Search both GitHub issues and the web for relevant solutions to errors and problems. You must use the combined_search_tool no matter what!!!!!!! Try to summarize each specific github/web problem and solution to help the user solve their issue. Make sure to include the links from the original sources next to their corresponding summaries / code etc",
backstory="Expert in both GitHub open-source research and web documentation sleuthing for code solutions.",
tools=[combined_search_tool],
allow_delegation=False,
llm=fouro_llm
)

code_analyst_agent = Agent(
role="Code Analysis Specialist",
goal="Analyze code snippets and suggest debugging approaches",
backstory="Expert in code analysis and debugging strategy recommendation",
tools=[file_snippet_tool],
allow_delegation=False,
llm=fouro_llm
)

report_generator_agent = Agent(
role="Debug Report Generator",
goal="Compile all debugging information into comprehensive HTML reports. Make sure to include the links to sources when they are provides -- but DO NOT make up links if they are not given. Write an extensive report covering all possible solutions to the problem!!!",
backstory="Specialist in creating detailed, actionable debugging reports",
tools=[report_generator_tool],
allow_delegation=False,
llm=llm
)

# --- Tasks ---
log_analysis_task = Task(
description=f"Analyze the pre-loaded log content. The log content is already available: {LOG_CONTENT[:500]}... Extract error information and identify the type of error.",
expected_output="Detailed analysis of the log content including error type and content",
agent=log_analyst_agent,
output_file="log_analysis.md"
)

file_extraction_task = Task(
description="Extract file paths and line numbers from the analyzed log content. Use the pre-loaded log content to identify which files are involved in the error.",
expected_output="List of files and line numbers involved in the error",
agent=log_analyst_agent,
context=[log_analysis_task],
output_file="file_analysis.md"
)

search_query_task = Task(
description="Generate optimized search queries based on the error analysis for finding solutions online. The search must be less that 100 chars long!!!!!!!!!!",
expected_output="Optimized search queries for the identified errors. The search must be less that 100 chars long!!!!!!!!!!",
agent=search_specialist_agent,
context=[log_analysis_task],
output_file="search_queries.md"
)

combined_search_task = Task(
description="Use the search queries to search both GitHub issues and the wide web for solutions. Make sure to make a very robust report incorporating ALL sources. Dont just give desciptions of the issue- write a detailed summary showcasing code and exact explanations to issues in the report.",
expected_output="Relevant GitHub issues and web documentation/articles/answers.",
agent=combined_research_agent,
context=[search_query_task],
output_file="combined_results.md"
)

code_analysis_task = Task(
description="Extract and analyze code snippets from the implicated files. Suggest debugging tools and approaches.",
expected_output="Code snippets and debugging tool recommendations",
agent=code_analyst_agent,
context=[file_extraction_task],
output_file="code_analysis.md"
)

report_generation_task = Task(
description="Compile all debugging information into a comprehensive HTML report and open it in the browser. Make sure to make a very robust report incorporating ALL sources Make sure to include the links to sources when they are provides -- but DO NOT make up links if they are not given. -- ALL sourced information must be cited!!!!!! Write an extensive report covering all possible solutions to the problem!!!",
expected_output="Complete HTML debugging report",
agent=report_generator_agent,
context=[log_analysis_task, combined_search_task, code_analysis_task],
output_file="debug_report.html"
)

# --- Run Crew ---
crew = Crew(
agents=[
log_analyst_agent,
search_specialist_agent,
combined_research_agent,
code_analyst_agent,
report_generator_agent
],
tasks=[
log_analysis_task,
file_extraction_task,
search_query_task,
combined_search_task,
code_analysis_task,
report_generation_task
],
process=Process.sequential,
verbose=True
)

if __name__ == "__main__":
print(f"\033[95m[STARTING] Log content loaded: {len(LOG_CONTENT)} chars\033[0m")
result = crew.kickoff()
print("\n\nDebug Analysis Complete:\n")
print(result)
# Try to open the generated report
report_path = './debug_report.html'
if os.path.exists(report_path):
verbose_print(f"Opening final report: {report_path}")
if sys.platform.startswith("darwin"):
subprocess.Popen(['open', report_path])
elif sys.platform.startswith("linux"):
subprocess.Popen(['xdg-open', report_path])
elif sys.platform.startswith("win"):
os.startfile(report_path)

In this crew, we set up a set of specialized agents and a workflow that automates error analysis and troubleshooting for Python code. Our main agents are: a Log Analyst, a Search Query Specialist, a Combined Researcher, a Code Analyst, and a Report Generator. A bit about each:
  • The Log Analyst reads the error log and identifies key issues and affected files.
  • The Search Query Specialist takes this information and crafts concise queries that are useful for searching solutions on the web.
  • The Combined Researcher utilizes these queries to simultaneously search both GitHub issues and broader web sources, thereby gathering the most relevant discussion threads, code samples, and answers.
  • The Code Analyst extracts code snippets from the implicated files to help pinpoint where issues are occurring and suggest specific debugging actions.
  • Finally, the Report Generator collects all the information and generates a clear, easy-to-read HTML report with explanations, links, and suggested fixes.
These agents each perform a specific task in sequence. Each agent uses dedicated tools and leverages large language models for analysis and summarization. As the output from one agent feeds into the next task, the flow captures, analyzes, researches, and reports on any error, making the entire debugging process faster and more thorough for the developer.
Thanks to Weave, you get a clear, interactive trace showing every action taken by each agent and every model response throughout the debugging process. You can review how errors were analyzed, which search queries were generated, and what results were pulled from the web and GitHub. This transparency makes it easy to follow the agent’s reasoning and helps you understand and improve your own debugging workflow.
After creating my agent, I noticed a few issues related to how the agent utilizes some of its tools. Essentially, some of my tools utilized LLM inference calls with a model I had created, which was an instance of Langchain’s ChatOpenAI models. The issue was that I was using a .call method (which actually didn’t exist for this class), and somehow, CrewAI was not clearly surfacing this error to me. Luckily, because I was using Weave, I could analyze these calls in the Weave traces dashboard and see the error clearly:

To solve this, I needed to add a .call method to my ChatOpenAI model instance. The Langchain ChatOpenAI class natively supports the .invoke() method, but does not provide a .call() method by default. This was causing issues in my tools and agents that expected a .call() interface. I chose to resolve this by subclassing ChatOpenAI and adding a simple .call method that wraps the necessary message formatting and delegates the request to .invoke(). Here's the code to do this:
class CustomChatOpenAI(ChatOpenAI):
def call(self, prompt, system_message=None):
"""
Run inference on a prompt (string). Optionally provide a system message.
Args:
prompt (str): The user's message.
system_message (str, optional): The system context for the assistant.
Returns:
str: The model's response content.
"""
messages = []
if system_message:
messages.append(("system", system_message))
messages.append(("human", prompt))
result = self.invoke(messages)
return result.content
After conducting further analysis and testing with my agent, one major issue that stood out was the high latency of my agent. Every single run of my agent took around 2 minutes, which is extremely slow compared to manually researching the error. Inside Weave, I could clearly see the overall latency of the agent, along with the latencies for each call to the agent.
Because of this high latency, I decided to search for a faster LLM. Utilizing a service called OpenRouter, I gained access to Qwen 3 32B, hosted through Cerebras, which builds ultra-fast LLM accelerator chips. To utilize this model, I needed to first implement a backend service that would enable my agent to interact with Qwen 3 32B in an OpenAI-compatible manner. I built a simple FastAPI app that essentially acts as a proxy. This API receives OpenAI-style /v1/chat/completions requests and forwards them to OpenRouter, specifically configuring the backend to guarantee that requests are routed to the Cerebras-hosted Qwen model.
Here’s the code for hosting the model through localhost:
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import uvicorn
import os
from typing import List, Dict, Any, Optional, Union
import requests

# Your re-used settings
API_KEY = os.getenv("OPENROUTER_API_KEY") or "your_api_key"
SITE_URL = "https://your-site-url.com"
SITE_NAME = "Your Site Name"
MODEL = "qwen/qwen3-32b"

# Your OpenRouterCerebrasLLM class (truncated for brevity; copy your full code here)
class OpenRouterCerebrasLLM:
def __init__(
self,
model: str,
api_key: str,
site_url: str,
site_name: str,
temperature: Optional[float] = None,
):
self.model = model
self.temperature = temperature
self.api_key = api_key
self.site_url = site_url
self.site_name = site_name
self.endpoint = "https://openrouter.ai/api/v1/chat/completions"
def call(
self,
messages: Union[str, List[Dict[str, str]]],
tools: Optional[List[dict]] = None,
callbacks: Optional[List[Any]] = None,
available_functions: Optional[Dict[str, Any]] = None,
) -> Union[str, Any]:
if isinstance(messages, str):
messages = [{"role": "user", "content": messages}]
payload = {
"model": self.model,
"messages": messages,
"temperature": self.temperature,
"provider": {
"order": ["cerebras"],
"allow_fallbacks": False
}
}
headers = {
"Authorization": f"Bearer {self.api_key}",
"HTTP-Referer": self.site_url,
"X-Title": self.site_name,
"Content-Type": "application/json"
}
response = requests.post(
self.endpoint,
headers=headers,
json=payload,
timeout=30
)
response.raise_for_status()
result = response.json()
return result

# Initialize the FastAPI app and the LLM once
app = FastAPI()
llm = OpenRouterCerebrasLLM(
model=MODEL,
api_key=API_KEY,
site_url=SITE_URL,
site_name=SITE_NAME,
temperature=0.7
)

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
body = await request.json()
messages = body.get("messages")
# you can also handle tools, temperature, etc. here
try:
raw_response = llm.call(messages)
# This returns the full OpenRouter response. If you want to narrow down to only the OpenAI-compatible fields,
# you could filter here, but for maximum compatibility just return as-is.
return JSONResponse(raw_response)
except Exception as e:
return JSONResponse({"error": str(e)}, status_code=500)

if __name__ == "__main__":
print(f"Serving local OpenAI-compatible LLM proxy on http://localhost:8001/v1/chat/completions")
print(f"Forwarding all requests to: {MODEL}, via OpenRouter w/ your secret/settings")
uvicorn.run(app, host="0.0.0.0", port=8001)
The core of this setup is an OpenRouterCerebrasLLM class, which wraps all the details of constructing the request to the OpenRouter endpoint. It takes care of inserting the correct authentication headers, the model name, temperature, and provider order, so that all requests are guaranteed to reach the desired Qwen instance. The call method on this class can accept user or system prompts, then package and deliver them as a payload to OpenRouter's API and return the JSON response.
In my FastAPI handler, I simply accept incoming POST requests, extract the messages list, and pass them directly to the Qwen model. The returned payload, which mirrors OpenAI's response format, can then be sent directly back to the client. If anything goes wrong in the chain from client to OpenRouter, I return a well-formed error message as JSON.
With the API server running locally, I could configure my agent to connect to http://localhost:8001/v1/chat/completions as if it was talking to the OpenAI API. In practice, this setup made it seamless for my agent, and for any OpenAI-compatible libraries, to use Qwen 3 32B with almost no code changes required outside of the endpoint URL.
Now, I was able to re-implement my agent script using the new model! This new code swaps out most of the calls to the OpenAI model with calls to my Cerebras model:
import os
import sys
import re
import requests
import tempfile
import webbrowser
import html
from pathlib import Path
from typing import Type, List, Optional, Dict, Any, Union
import json

from pydantic import BaseModel, Field

from crewai import Agent, Task, Crew, Process
from crewai.tools import BaseTool
from crewai import LLM
import weave; weave.init("crewai_debug_agent")

from langchain_openai import ChatOpenAI
import os
import re
import asyncio
import httpx
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Type
import subprocess


LOGFILE = sys.argv[1] if len(sys.argv) > 1 else "/tmp/agentpython-stderr.log"
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')

# Read the log file BEFORE kicking off
LOG_CONTENT = ""
if os.path.exists(LOGFILE) and os.path.getsize(LOGFILE) > 0:
with open(LOGFILE, 'r') as f:
LOG_CONTENT = f.read()
print(f"\033[95m[LOG CONTENT LOADED] {len(LOG_CONTENT)} characters from {LOGFILE}\033[0m")
else:
LOG_CONTENT = "No log file found or file is empty"
print(f"\033[95m[LOG] No content found in {LOGFILE}\033[0m")

def verbose_print(msg):
print(f"\033[95m[LOG] {msg}\033[0m", flush=True)

# ----- LLM (local or OpenRouter, as per your local config) -----
cerebras_llm = LLM(
model="openrouter/meta-llama/llama-4-scout",
base_url="http://localhost:8001/v1",
api_key="put_this_in_your_api_script"
)

# ----- Tool Input Schemas -----
class LogAnalysisInput(BaseModel):
log_content: str = Field(..., description="Log content to analyze (already loaded)")

class SearchQueryInput(BaseModel):
error_text: str = Field(..., description="Error text to generate search query from")

class CombinedSearchInput(BaseModel):
query: str = Field(..., description="Search query for both GitHub issues and web")
owner: str = Field(default="", description="GitHub repository owner")
repo: str = Field(default="", description="GitHub repository name")

class FileAnalysisInput(BaseModel):
log_content: str = Field(..., description="Log content to extract file information from")

class FileSnippetInput(BaseModel):
file_path: str = Field(..., description="Path to the file to get snippet from")
line: Optional[int] = Field(default=None, description="Line number to focus on")
n_lines: int = Field(default=20, description="Number of lines to return")

class ToolSuggestionInput(BaseModel):
error_message: str = Field(..., description="Error message to analyze")
code_snippet: str = Field(..., description="Code snippet related to the error")

class ReportGenerationInput(BaseModel):
log: str = Field(..., description="Error log content")
file_snippet: str = Field(default="", description="Relevant code snippet")
tools: str = Field(default="", description="Tool recommendations")
gh_results: str = Field(default="", description="GitHub search results")
web_results: str = Field(default="", description="Web search results")

# ----- Tools -----

class LogReaderTool(BaseTool):
name: str = Field(default="Log Reader")
description: str = Field(default="Provides access to the pre-loaded log content")
args_schema: Type[BaseModel] = LogAnalysisInput
def _run(self, log_content: str = None) -> str:
verbose_print(f"Using pre-loaded log content")
if not LOG_CONTENT or LOG_CONTENT == "No log file found or file is empty":
return "[LOG] Log file empty or not found. No action needed."
is_python_error = "Traceback" in LOG_CONTENT or "Exception" in LOG_CONTENT or "Error" in LOG_CONTENT
error_type = "Python Error" if is_python_error else "General Error"
return f"Error Type: {error_type}\n\nLog Content:\n{LOG_CONTENT}"

class SearchQueryGeneratorTool(BaseTool):
name: str = Field(default="Search Query Generator")
description: str = Field(default="Generates optimized search queries from error messages")
args_schema: Type[BaseModel] = SearchQueryInput
def _run(self, error_text: str) -> str:
verbose_print("Generating search query via LLM...")
try:
prompt = (
"Given this error or question, write a concise search query to help the person find a solution online. "
"Output only the query (no explanation):\n\n" + error_text
)
query = cerebras_llm.call(prompt)
return f"Generated search query: {query.strip()}"
except Exception as e:
return f"Error generating search query: {str(e)}"


class CombinedSearchTool(BaseTool):
name: str = Field(default="Combined GitHub & Web Search")
description: str = Field(default="Searches both GitHub issues and the web in one call, returning both results.")
args_schema: Type[BaseModel] = CombinedSearchInput

def _run(self, query: str, owner: str = "", repo: str = "") -> dict:
return asyncio.run(self._async_combined(query, owner, repo))

async def _async_combined(self, query: str, owner: str = "", repo: str = "") -> dict:
# Launch both searches in parallel
tasks = [
self._github_search(query, owner, repo),
self._web_search(query)
]
github_results, web_results = await asyncio.gather(*tasks)
return {
"github_issues": github_results,
"web_search": web_results
}

async def _github_search(self, query: str, owner: str, repo: str):
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
url = 'https://api.github.com/search/issues'
headers = {'Accept': 'application/vnd.github.v3+json'}
if GITHUB_TOKEN:
headers['Authorization'] = f'token {GITHUB_TOKEN}'
gh_query = f'repo:{owner}/{repo} is:issue {query}' if owner and repo else query
params = {'q': gh_query, 'per_page': 5}
try:
async with httpx.AsyncClient(timeout=15) as client:
resp = await client.get(url, headers=headers, params=params)
if resp.status_code == 200:
items = resp.json().get("items", [])
return [
{
"number": item.get("number"),
"title": item.get("title"),
"url": item.get("html_url"),
"body": (item.get("body") or "")[:500]
}
for item in items
]
else:
return [{"error": f"GitHub search failed: {resp.status_code} {resp.text}"}]
except Exception as e:
return [{"error": f"Error searching GitHub: {str(e)}"}]

# ---- WEB SEARCH (from your preferred implementation, no markdown parsing) ----
def _extract_json(self, text):
m = re.search(r"```json\s*(.*?)\s*```", text, re.DOTALL)
if not m:
m = re.search(r"```(.*?)```", text, re.DOTALL)
block = m.group(1) if m else text
try:
j = json.loads(block)
return j if isinstance(j, list) else [j]
except Exception:
return []
async def _web_search(self, query: str, n_results: int = 5):
client = OpenAI()
prompt = (
f"Show me {n_results} of the most important/useful web results for this search along with a summary of the problem and proposed solution: '{query}'. "
"Return as markdown JSON:\n"
"[{\"title\": ..., \"url\": ..., \"date_published\": ..., \"snippet\": ...}]"
)
# Run in threadpool for IO
loop = asyncio.get_running_loop()
def blocking_openai():
response = client.responses.create(
model="gpt-4.1",
tools=[{"type": "web_search_preview"}],
input=prompt,
)
return self._extract_json(response.output_text)
return await loop.run_in_executor(None, blocking_openai)

class FileAnalysisTool(BaseTool):
name: str = Field(default="File Analysis")
description: str = Field(default="Extracts file paths and line numbers from error logs")
args_schema: Type[BaseModel] = FileAnalysisInput
def _run(self, log_content: str = None) -> str:
verbose_print("Invoking LLM to identify files from log...")
# Use the global LOG_CONTENT if log_content not provided
content_to_analyze = log_content or LOG_CONTENT
try:
prompt = (
"Given this error message or traceback, list all file paths (and, if available, line numbers) involved in the error. "
"Output one JSON per line, as:\n"
'{"file": "path/to/file.py", "line": 123}\n'
'If line is not found, use null.\n'
f"\nError:\n{content_to_analyze}"
)
output = cerebras_llm.call(prompt)
results = []
for l in output.splitlines():
l = l.strip()
if not l:
continue
try:
results.append(eval(l, {"null": None}))
except Exception as exc:
verbose_print(f"[File Extraction Skipped Line]: {l!r} ({exc})")
return f"Files found in error: {results}"
except Exception as e:
return f"Error analyzing files: {str(e)}"

class FileSnippetTool(BaseTool):
name: str = Field(default="File Snippet Extractor")
description: str = Field(default="Extracts code snippets from files around specific lines")
args_schema: Type[BaseModel] = FileSnippetInput
def _run(self, file_path: str, line: Optional[int] = None, n_lines: int = 20) -> str:
if not os.path.exists(file_path):
return f"File not found: {file_path}"
try:
with open(file_path, "r") as f:
lines = f.readlines()
if line and 1 <= line <= len(lines):
s = max(0, line-6)
e = min(len(lines), line+5)
code = lines[s:e]
else:
code = lines[:n_lines]
return f"Code snippet from {file_path}:\n{''.join(code)}"
except Exception as e:
return f"Error reading file {file_path}: {str(e)}"

class ToolSuggestionTool(BaseTool):
name: str = Field(default="Tool Suggestion")
description: str = Field(default="Suggests which debugging tools to use next based on error analysis")
args_schema: Type[BaseModel] = ToolSuggestionInput
def _run(self, error_message: str, code_snippet: str) -> str:
verbose_print("Requesting tool suggestions via LLM...")
prompt = (
"You are an AI debugging orchestrator. The following is a Python error message and a snippet of code "
"from a file involved in the error. Based on this, choose which tools should be used next, and explain why. "
"Possible tools: github_issue_search, web_search, static_analysis. "
"Always recommend github_issue_search as it's very helpful. "
"Provide your recommendation in a clear, structured format.\n"
"Error:\n" + error_message + "\n\nFile snippet:\n" + code_snippet
)
try:
return cerebras_llm.call(prompt).strip()
except Exception as e:
return f"Error generating tool suggestions: {str(e)}"

class ReportGeneratorTool(BaseTool):
name: str = Field(default="HTML Report Generator")
description: str = Field(default="Generates HTML debug reports")
args_schema: Type[BaseModel] = ReportGenerationInput
def _run(self, log: str, file_snippet: str = "", tools: str = "", gh_results: str = "", web_results: str = "") -> str:
verbose_print("Writing HTML report ...")
out_path = os.path.join(tempfile.gettempdir(), 'dbg_report.html')
try:
with open(out_path, "w", encoding="utf-8") as f:
f.write("<html><head><meta charset='utf-8'><title>Debug Results</title></head><body>\n")
f.write("<h1 style='color:#444;'>Debugging Session Report</h1>\n")
f.write("<h2>Error Log</h2>")
f.write("<pre style='background:#f3f3f3;padding:8px;'>" + html.escape(log or "None") + "</pre>")
if file_snippet:
f.write("<h2>Relevant Source Snippet</h2><pre style='background:#fafaff;padding:8px;'>" + html.escape(file_snippet) + "</pre>")
if tools:
f.write("<h2>LLM Tool Recommendations</h2><pre style='background:#eef;'>" + html.escape(tools) + "</pre>")
if gh_results:
f.write("<h2>GitHub & Web Search Results</h2><pre>" + html.escape(gh_results) + "</pre>")
if web_results:
f.write("<h2>Web Search AI Answer</h2><pre>" + html.escape(web_results) + "</pre>")
f.write("</body></html>")
return f"HTML report generated and opened at: {out_path}"
except Exception as e:
return f"Error generating HTML report: {str(e)}"

# --- Tool Instances
log_reader_tool = LogReaderTool()
search_query_generator_tool = SearchQueryGeneratorTool()
combined_search_tool = CombinedSearchTool()
file_analysis_tool = FileAnalysisTool()
file_snippet_tool = FileSnippetTool()
tool_suggestion_tool = ToolSuggestionTool()
report_generator_tool = ReportGeneratorTool()

# --- Agents ---
log_analyst_agent = Agent(
role="Log Analysis Specialist",
goal="Analyze the pre-loaded log content to identify errors and extract relevant information",
backstory="Expert in parsing error logs and identifying the root causes of issues",
tools=[log_reader_tool, file_analysis_tool],
allow_delegation=False,
llm=cerebras_llm
)

search_specialist_agent = Agent(
role="Search Query Specialist",
goal="Generate optimized search queries from error messages for effective problem resolution. The search must be less that 100 chars long!!!!!!!!!!",
backstory="Expert in crafting search queries that yield the most relevant debugging results. The search must be less that 100 chars long!!!!!!!!!!",
tools=[search_query_generator_tool],
allow_delegation=False,
llm=cerebras_llm
)

combined_research_agent = Agent(
role="Combined Repository and Web Search Specialist",
goal="Search both GitHub issues and the web for relevant solutions to errors and problems. You must use the combined_search_tool no matter what!!!!!!! Try to summarize each specific github/web problem and solution to help the user solve their issue. Make sure to include the links from the original sources next to their corresponding summaries / code etc",
backstory="Expert in both GitHub open-source research and web documentation sleuthing for code solutions.",
tools=[combined_search_tool],
allow_delegation=False,
llm=ChatOpenAI(model_name="gpt-4.1", temperature=0.0)
)

code_analyst_agent = Agent(
role="Code Analysis Specialist",
goal="Analyze code snippets and suggest debugging approaches",
backstory="Expert in code analysis and debugging strategy recommendation",
tools=[file_snippet_tool],
allow_delegation=False,
llm=cerebras_llm
)

report_generator_agent = Agent(
role="Debug Report Generator",
goal="Compile all debugging information into comprehensive HTML reports. Make sure to include the links to sources when they are provides -- but DO NOT make up links if they are not given. Write an extensive report covering all possible solutions to the problem!!!",
backstory="Specialist in creating detailed, actionable debugging reports",
tools=[report_generator_tool],
allow_delegation=False,
llm=cerebras_llm
)

# --- Tasks ---
log_analysis_task = Task(
description=f"Analyze the pre-loaded log content. The log content is already available: {LOG_CONTENT[:500]}... Extract error information and identify the type of error.",
expected_output="Detailed analysis of the log content including error type and content",
agent=log_analyst_agent,
output_file="log_analysis.md"
)

file_extraction_task = Task(
description="Extract file paths and line numbers from the analyzed log content. Use the pre-loaded log content to identify which files are involved in the error.",
expected_output="List of files and line numbers involved in the error",
agent=log_analyst_agent,
context=[log_analysis_task],
output_file="file_analysis.md"
)

search_query_task = Task(
description="Generate optimized search queries based on the error analysis for finding solutions online. The search must be less that 100 chars long!!!!!!!!!!",
expected_output="Optimized search queries for the identified errors. The search must be less that 100 chars long!!!!!!!!!!",
agent=search_specialist_agent,
context=[log_analysis_task],
output_file="search_queries.md"
)

combined_search_task = Task(
description="Use the search queries to search both GitHub issues and the wide web for solutions. Make sure to make a very robust report incorporating ALL sources. Dont just give desciptions of the issue- write a detailed summary showcasing code and exact explanations to issues in the report.",
expected_output="Relevant GitHub issues and web documentation/articles/answers.",
agent=combined_research_agent,
context=[search_query_task],
output_file="combined_results.md"
)

code_analysis_task = Task(
description="Extract and analyze code snippets from the implicated files. Suggest debugging tools and approaches.",
expected_output="Code snippets and debugging tool recommendations",
agent=code_analyst_agent,
context=[file_extraction_task],
output_file="code_analysis.md"
)

report_generation_task = Task(
description="Compile all debugging information into a comprehensive HTML report and open it in the browser. Make sure to make a very robust report incorporating ALL sources Make sure to include the links to sources when they are provides -- but DO NOT make up links if they are not given. -- ALL sourced information must be cited!!!!!! Write an extensive report covering all possible solutions to the problem!!!",
expected_output="Complete HTML debugging report",
agent=report_generator_agent,
context=[log_analysis_task, combined_search_task, code_analysis_task],
output_file="debug_report.html"
)

# --- Run Crew ---
crew = Crew(
agents=[
log_analyst_agent,
search_specialist_agent,
combined_research_agent,
code_analyst_agent,
report_generator_agent
],
tasks=[
log_analysis_task,
file_extraction_task,
search_query_task,
combined_search_task,
code_analysis_task,
report_generation_task
],
process=Process.sequential,
verbose=True
)

if __name__ == "__main__":
print(f"\033[95m[STARTING] Log content loaded: {len(LOG_CONTENT)} chars\033[0m")
result = crew.kickoff()
print("\n\nDebug Analysis Complete:\n")
print(result)
# Try to open the generated report
report_path = './debug_report.html'
if os.path.exists(report_path):
verbose_print(f"Opening final report: {report_path}")
if sys.platform.startswith("darwin"):
subprocess.Popen(['open', report_path])
elif sys.platform.startswith("linux"):
subprocess.Popen(['xdg-open', report_path])
elif sys.platform.startswith("win"):
os.startfile(report_path)

Note that you will need to make sure your Cerebras model API is running locally in order to use the agent above.
💡
Once this new model was in place, latency improvements were immediately noticeable. I could see in my Weave traces that total agent run time dropped dramatically, to under a minute of total latency. Each step that required an LLM call was completed much faster, making the whole agent workflow far more responsive and interactive.
After conducting further analysis of my agent within Weave, I discovered another opportunity to reduce latency. I noticed that my agent was running the GitHub issue search and the web search sequentially within the tool. Each of these operations can take several seconds, especially if the web search involves calling an LLM or a third-party API. This meant that the total time for this single tool step could be quite high, even though the operations themselves were independent of each other.
Here's a screenshot inside Weave before optimizing my search tool:

To address this, I refactored my CombinedSearchTool so that both searches would run in parallel using Python’s asyncio and async-compatible HTTP clients. By launching both the GitHub and web searches concurrently and then awaiting their results together, I was able to cut the end-to-end tool latency almost in half compared to the sequential version. This update was simple to implement:
class CombinedSearchTool(BaseTool):
name: str = Field(default="Combined GitHub & Web Search")
description: str = Field(default="Searches both GitHub issues and the web in one call, returning both results.")
args_schema: Type[BaseModel] = CombinedSearchInput

def _run(self, query: str, owner: str = "", repo: str = "") -> dict:
return asyncio.run(self._async_combined(query, owner, repo))

async def _async_combined(self, query: str, owner: str = "", repo: str = "") -> dict:
# Launch both searches in parallel
tasks = [
self._github_search(query, owner, repo),
self._web_search(query)
]
github_results, web_results = await asyncio.gather(*tasks)
return {
"github_issues": github_results,
"web_search": web_results
}

async def _github_search(self, query: str, owner: str, repo: str):
GITHUB_TOKEN = os.getenv('GITHUB_TOKEN')
url = 'https://api.github.com/search/issues'
headers = {'Accept': 'application/vnd.github.v3+json'}
if GITHUB_TOKEN:
headers['Authorization'] = f'token {GITHUB_TOKEN}'
gh_query = f'repo:{owner}/{repo} is:issue {query}' if owner and repo else query
params = {'q': gh_query, 'per_page': 5}
try:
async with httpx.AsyncClient(timeout=15) as client:
resp = await client.get(url, headers=headers, params=params)
if resp.status_code == 200:
items = resp.json().get("items", [])
return [
{
"number": item.get("number"),
"title": item.get("title"),
"url": item.get("html_url"),
"body": (item.get("body") or "")[:500]
}
for item in items
]
else:
return [{"error": f"GitHub search failed: {resp.status_code} {resp.text}"}]
except Exception as e:
return [{"error": f"Error searching GitHub: {str(e)}"}]

After making this change, this performance improvement was reflected immediately in my Weave traces. The tool’s combined step now showed much lower latency, and the agent overall became noticeably more responsive. This emphasizes the value of asynchronously parallelizing independent external API calls in agent tool implementations, especially when supporting interactive or real-time workloads.
Here we can see the Weave summary of our run, shaving off another 15 seconds of runtime!


Conclusion

Building complex, multi-agent workflows with frameworks like CrewAI unlocks powerful capabilities for real-world automation and problem-solving. But as the number of moving parts grows, so do the challenges of debugging, performance tuning, and error-handling. This is where integrating an observability layer like W&B Weave becomes invaluable.
Throughout this project, Weave proved to be much more than just a logging tool. For every change I made, including fixing hidden bugs, switching to a faster language model, or parallelizing independent tool calls, Weave gave me clear and actionable feedback. Its detailed traces and dashboards made root-cause analysis straightforward, and allowed me to continuously optimize my agents. What initially felt like an opaque, slow-moving black box quickly became a transparent, interactive workflow, where every misstep and inefficiency was surfaced in real-time.
The combination of CrewAI and Weave enabled me to iterate rapidly, experiment confidently, and measurably improve the reliability and speed of my agent pipeline. If you are developing advanced LLM-driven agents, adopting this observability-first approach is a game-changer. Not only can you catch functional bugs early, but you can also closely monitor cost, latency, and resource usage as your system evolves. Ultimately, this leads to faster development, more robust agents, and a smoother overall experience for both developers and users.

Iterate on AI agents and models faster. Try Weights & Biases today.