Skip to main content

Tutorial: Building AI agents with CrewAI

This guide explores how AI agents, powered by CrewAI, automate complex tasks with minimal human input by integrating adaptive workflows, real-time data analysis, and iterative improvements.
Created on February 24|Last edited on February 26
AI agents are transforming the way tasks are automated, making them more adaptive, intelligent, and capable of handling complex workflows with minimal human input. Whether you're new to AI or looking to build your first agent, this guide walks you through the entire process - from defining an agent’s role to setting up the right tools and integrating it into real-world applications.
We'll be using CrewAI, a framework designed to make multi-agent systems easy to build and manage. By breaking down AI agent development into simple, manageable steps, you’ll learn how to create a system that not only automates tasks but also refines its outputs through iterative improvements. We’ll cover how to define scope, choose the right tools, structure workflows, and evaluate performance, ensuring your AI agent is both effective and reliable.
For those who want to run through the example before building your own agent with CrewAI, there's a Colab to do just that here:

For those who want to get started learning, let's dive in.


Table of contents



What are AI agents?

AI agents are autonomous systems that use LLMs and generative AI to search, process, and analyze information. Unlike traditional automation, which follows rigid rules, AI agents adapt to their environment. They use memory, reasoning, and external tools to analyze information and complete tasks with minimal human oversight.
The first step in building an AI agent, whether with CrewAI or otherwise, is defining its purpose, scope, and boundaries. Without clear guidelines, an agent may produce irrelevant, misleading, or overly broad outputs. A well-defined scope ensures efficiency, accuracy, and alignment with the intended goal.
For this example, we are building an AI-powered news research agent using CrewAI. The goal is to autonomously track and summarize the latest developments in AI by scanning the web, analyzing sources, and producing structured reports. Our system consists of two agents with specialized roles:
  1. AI News Researcher: Finds the latest AI news published within the last 24 hours. It filters out outdated or low-relevance content and provides direct links to credible sources.
  2. AI Story Investigator: Expands on the initial findings by retrieving deeper context, additional sources, and expert opinions. Each news item receives a 300-word detailed breakdown, ensuring a well-researched summary.

Defining the scope and constraints

To maintain accuracy, our agent:
  • Scans only the latest AI news (published within the last 24 hours) to ensure freshness and relevance.
  • Prioritizes credible sources while filtering out low-quality or unreliable content.
  • Focuses on significant AI advancements, including new models, research breakthroughs, and industry trends.
This setup enables automated AI news tracking, reducing the need for manual research while maintaining high-quality, structured outputs.

Building your first AI agent with CrewAI

Before diving into the code, let’s break down the core components of CrewAI and how they work together. CrewAI provides a framework for defining and managing multi-agent AI systems, allowing agents to interact, delegate tasks, and autonomously complete complex workflows.

Core components of CrewAI

  1. Agents: Autonomous AI entities that perform specific tasks. Each agent has a goal, a backstory, and tools to complete its assigned tasks efficiently.
  2. Tasks: The discrete objectives assigned to agents. Tasks define what needs to be done, what the expected output should be, and which agent is responsible for completing it.
  3. Crew: The orchestrator of multiple agents and tasks. It manages execution workflows (e.g., sequential or parallel) and ensures smooth data exchange between agents.
  4. Tools: External APIs, search tools, or functions that agents can use to expand their capabilities. For example, an agent can use a web search tool to find new information.

Using Config Files vs. inline format in CrewAI

CrewAI allows you to define agents and tasks using configuration files (YAML files) or an inline Python approach. For complex systems, configuration files (YAML) improve maintainability. However, for simpler use cases, the inline Python format is preferable because it provides better visibility into component interactions and allows for dynamic modifications within a script.
To enable our agent to browse the web and find real-time AI news, we need access to a search API. In this case, we are using Serper, a Google Search API that allows our agent to retrieve up-to-date results. Each account comes with 2,500 free credits.
Additionally, since our AI agent relies on OpenAI’s models for processing and summarization, you’ll also need an OpenAI API key. If you don’t already have one, you can generate it from OpenAI’s platform. Both API keys must be set in your environment for the agent to function properly.
Here's how you can the API keys in your terminal:
On macOS/Linux:
export SERPER_API_KEY=your_serper_api_key_here
export OPENAI_API_KEY=your_openai_api_key_here
On Windows (PowerShell):
$env:SERPER_API_KEY="your_serper_api_key_here"
$env:OPENAI_API_KEY="your_openai_api_key_here"
Also make sure to install the following pip dependencies:
pip install 'crewai[tools]' langchain-openai weave
Now, let’s build an AI agent with CrewAI that automatically tracks and summarizes AI news stories. This implementation consists of:
  • An AI News Researcher to find the latest AI news.
  • An AI Story Investigator to analyze and expand on key findings.
  • A Crew to manage and execute tasks in sequence.
import os
import warnings
from datetime import datetime
from textwrap import dedent
from crewai import Agent, Crew, Process, Task
from crewai_tools import SerperDevTool
from langchain_openai import ChatOpenAI

import weave
weave.init("crewai-ai-news-agent")

search_tool = SerperDevTool()

researcher = Agent(
role="AI News Researcher",
backstory="An AI-focused journalist specializing in breaking news and emerging trends.",
goal="Find the most recent AI news from the last 24 hours. Note --> IF NEWS IS NOT WITHIN THE LAST FEW DAYS, DO NOT INCLUDE IT!!!!!!!",
tools=[search_tool],
allow_delegation=False,
verbose=True,
max_iter=3,
llm=ChatOpenAI(model_name="gpt-4o-mini", temperature=0.8)
)

story_researcher = Agent(
role="AI Story Investigator",
backstory="A deep-research AI journalist who expands on initial news findings, gathering additional context, sources, and insights for each AI story.",
goal="For each news story found, conduct a deeper investigation to retrieve additional information, sources, and full context. Write **at least 300 words** for each story, including details, expert opinions, and possible implications. Ensure you provide the full URL of the main source.",
tools=[search_tool],
allow_delegation=False,
verbose=True,
max_iter=5,
llm=ChatOpenAI(model_name="gpt-4o-mini", temperature=0.7)
)

research_task = Task(
description=dedent("""
Search for the most recent AI news published in the last 24 hours.
Extract the **5 most important developments**, including company names, new AI models, breakthroughs, and market trends.

**Query Parameters:**
- Search for AI-related news.
- Filter results to only include content from the last 24 hours.

**Example Output Format:**
1. OpenAI announces GPT-5 development with a new multimodal framework. (Source: TechCrunch - https://techcrunch.com/example)
2. Google DeepMind unveils Gemini 1.5 with extended context length. (Source: The Verge - https://theverge.com/example)
3. NVIDIA releases new AI GPUs optimized for deep learning. (Source: Reuters - https://reuters.com/example)
"""),
expected_output="A list of the 10 most important AI news updates from the last 24 hours, each with a direct source URL.",
agent=researcher,
output_file="latest_ai_news.md"
)

story_research_task = Task(
description=dedent("""
For each AI news story identified, perform **detailed research** to provide a comprehensive 300-word analysis.

**Steps:**
1. Search for additional details, expert opinions, technical specifications, and market reactions for each news story.
2. Write a **detailed 300-word breakdown** of each AI news update.
3. Verify and include **full URLs** for all primary sources.

**Example Output Format:**
### OpenAI's GPT-5 Development
OpenAI has officially confirmed the development of GPT-5, its next-generation language model. The company states that this iteration will be highly multimodal, improving text, image, and potentially even video understanding. According to a TechCrunch interview with OpenAI CEO Sam Altman, GPT-5 is expected to have a significantly larger context window, allowing for improved memory and long-term coherence.

Industry experts speculate that OpenAI will release GPT-5 in late 2025, focusing on fine-tuning multimodal capabilities. Some researchers have raised concerns about AI alignment, urging OpenAI to implement stronger safeguards against biases and hallucinations.

Major competitors, including Google DeepMind and Anthropic, are also preparing advanced AI models in response. DeepMind’s Gemini 1.5 is expected to launch before GPT-5, increasing competition in the space.

**Sources:**
- TechCrunch: https://techcrunch.com/example
- MIT Technology Review: https://technologyreview.com/example
- OpenAI Blog: https://openai.com/example

Repeat this level of depth for each of the 10 AI news stories.
"""),
expected_output="A detailed 300-word analysis for each AI news story, with full source citations.",
agent=story_researcher,
context=[research_task],
output_file="expanded_ai_news.md"
)

crew = Crew(
agents=[researcher, story_researcher],
tasks=[research_task, story_research_task],
verbose=True,
process=Process.sequential
)

def run():
inputs = {
"topic": "Latest AI News",
"current_year": str(datetime.now().year),
"current_month": str(datetime.now().month),
"current_day": str(datetime.now().day)
}

result = crew.kickoff(inputs=inputs)

print("\n\n### Final AI News Report ###\n")
print(result)

print("\n\n### Task Outputs ###\n")
for task in crew.tasks:
if task.output_file:
with open(task.output_file, "r") as f:
print(f"Task: {task.description}\nOutput:\n{f.read()}\n{'-'*40}")

if __name__ == "__main__":
run()

Breaking down the code

Now that we've seen the implementation, let’s go over the key components. Each agent is given a specific role, goal, and set of tools. In this example, we have a News Researcher responsible for gathering recent AI news and a Story Investigator who expands on those findings with additional research.
Agents are assigned tasks that align with their roles. The research task focuses on searching for AI news from the last 24 hours, while the story research task takes those findings and generates detailed reports with deeper context. The Crew object manages execution, ensuring that agents complete tasks in a structured sequence. Since we use a sequential process, one task must finish before the next begins.
This example demonstrates how CrewAI enables modular, scalable AI systems where agents specialize in different roles. From here, the agent can be expanded by adding more tools, integrating APIs, or defining new behaviors to enhance its capabilities.
Since the underlying LLM inference library has already integrated Weave, all we need to do is import Weave, and we will see every call to our model show up inside Weave. This is useful for refining prompts, improving accuracy, and debugging any inconsistencies. Instead of manually inspecting logs, Weave provides a structured view of how information flows through the system, making optimizations and adjustments much easier.
Here's a screenshot of what it looks like inside Weave after running our script:


Defining the tasks and prompts for your CrewAI agent

Defining the tasks and prompts that guide each CrewAI agent’s behavior is extremely important. To better understand how these agents operate, let's break down the specific prompts used for both the AI News Researcher and the AI Story Investigator.
The research task serves as the foundation for gathering recent news. Here’s the breakdown of the prompt used for the AI News Researcher:
description=dedent("""
Search for the most recent AI news published in the last 24 hours.
Extract the **5 most important developments**, including company names, new AI models, breakthroughs, and market trends.

**Query Parameters:**
- Search for AI-related news.
- Filter results to only include content from the last 24 hours.

**Example Output Format:**
1. OpenAI announces GPT-5 development with a new multimodal framework. (Source: TechCrunch - https://techcrunch.com/example)
2. Google DeepMind unveils Gemini 1.5 with extended context length. (Source: The Verge - https://theverge.com/example)
3. NVIDIA releases new AI GPUs optimized for deep learning. (Source: Reuters - https://reuters.com/example)
""")
This prompt ensures that the AI News Researcher agent gathers AI-related news published within the past 24 hours. The query parameters help the agent filter out outdated information and focus on the most relevant and recent developments. The agent prioritizes AI advancements, including new models, breakthroughs, and industry trends. This approach guarantees that the news collected is timely and directly relevant to the project’s goal of summarizing key AI developments.
The output format is structured to present each news item concisely. Each result includes a headline summarizing the event, the source where the news was published, and a direct URL link. This clear format ensures that the output is easily readable and accessible, guiding the AI News Researcher agent to collect and present information effectively.
The second agent—the AI Story Investigator—is responsible for expanding on the initial findings by providing deeper context, additional sources, and expert opinions. Here’s the breakdown of the story research task:
description=dedent("""
For each AI news story identified, perform **detailed research** to provide a comprehensive 300-word analysis.

**Steps:**
1. Search for additional details, expert opinions, technical specifications, and market reactions for each news story.
2. Write a **detailed 300-word breakdown** of each AI news update.
3. Verify and include **full URLs** for all primary sources.

**Example Output Format:**
### OpenAI's GPT-5 Development
OpenAI has officially confirmed the development of GPT-5, its next-generation language model. The company states that this iteration will be highly multimodal, improving text, image, and potentially even video understanding. According to a TechCrunch interview with OpenAI CEO Sam Altman, GPT-5 is expected to have a significantly larger context window, allowing for improved memory and long-term coherence.

Industry experts speculate that OpenAI will release GPT-5 in late 2025, focusing on fine-tuning multimodal capabilities. Some researchers have raised concerns about AI alignment, urging OpenAI to implement stronger safeguards against biases and hallucinations.

Major competitors, including Google DeepMind and Anthropic, are also preparing advanced AI models in response. DeepMind’s Gemini 1.5 is expected to launch before GPT-5, increasing competition in the space.

**Sources:**
- TechCrunch: https://techcrunch.com/example
- MIT Technology Review: https://technologyreview.com/example
- OpenAI Blog: https://openai.com/example
""")
This prompt guides the AI Story Investigator to conduct in-depth research and produce a comprehensive 300-word analysis for each news item. The agent gathers multiple sources so that each story is thoroughly contextualized. By verifying with sources and including full URLs, the agent maintains transparency and credibility.
The agent expands on the initial news item, explaining its significance, incorporating expert insights, and detailing its technical or industry impact. The inclusion of verified sources ensures that the final output is well-supported and reliable.
The effectiveness of these agents depends on the clarity and precision of their prompts. The AI News Researcher ensures the news is timely by focusing only on stories from the past 24 hours, while the Story Investigator adds depth and context to each story. Together, these agents create a streamlined AI news research system that can continuously refine its results for better accuracy and relevance.

How to build an AI agent with CrewAI

Building an AI agent isn’t just about writing code. It’s about understanding the task, breaking it into smaller steps, and iterating until the system works reliably.
Developing a useful AI agent requires multiple iterations, adjustments, and refinements to get it right. Here are the essential steps to ensure your agent is reliable, effective, and scalable.

Step 1 – Conceptualization and goal definition

Begin by outlining your AI agent’s goal, whether it’s automating customer interactions or enhancing research efficiency. Defining your agent's scope helps set clear development targets and ensures the system remains focused. Proper planning at this stage is necessary to avoid unnecessary complexity and guide the AI’s decision-making process.
Since AI agents rely on LLMs to process information and execute tasks, consider how the model will interact with data and refine responses. A well-structured foundation improves adaptability and performance, making the agent more effective in achieving its intended purpose.

Step 2 – Do the task manually first (several times)

Before automating anything, run through the task manually to see how it should work. If the agent is gathering AI news, go through the process yourself first to figure out the best sources, what kind of results are actually useful, and what issues come up.
The search tool is the backbone of the system, and if it performs poorly, the final results generated by the agent will be no different. Going through this process manually helps define the scope of the agent, preventing unnecessary complexity or irrelevant functionality.

Step 3 – Break the task down into smaller components and define instructions

Once you have a clear understanding of the workflow, you can break the task into modular steps that the AI agent can handle. Rather than trying to automate everything at once, focus on creating well-defined subtasks. Develop clear instructions and variables for your AI agent to establish autonomous decision-making capabilities; this ensures that the agent can operate independently based on predefined parameters.
For example, an AI news research workflow could include:
  1. News Research Agent: Finds and filters the most relevant AI news.
  2. Story Expansion Agent: Analyzes key articles, gathers expert insights, and generates summaries.
Each of these steps is a small, manageable task that an agent can complete reliably. This approach also makes debugging easier, since you can test each component separately before integrating them into a full system.

Step 4 – Testing and iterating

Rarely does an AI agent "work" perfectly on the first try. Chances are, your agent will require multiple iterations to iron out bugs, refine decision-making, and improve accuracy. The first version might return incomplete results, miss important information, or struggle with consistency.
Testing involves:
  • Running real-world examples to see if the agent produces useful outputs.
  • Adjusting prompt instructions, parameters, or tools based on performance.
  • Identifying failure points, such as low-quality data sources or incorrect reasoning.
Iteration is an ongoing process. Even after your agent is functional, you may need to fine-tune responses, optimize task delegation, or add better tools to improve efficiency. AI agents don’t just need to work, they need to work reliably and adapt as the problem space evolves.
By following these steps, you create an AI agent that is well-defined, modular, and tested in real-world conditions before full deployment.

Step 5 – Integration with external tools and workflows

Your AI agent’s insights are only valuable if they reach the right people. Integration with tools like Slack, email, or dashboards ensures users can access reports seamlessly within their workflow.
The goal is to make sure updates and insights reach the right people without adding extra steps. It also helps to have some infrastructure for reporting issues and handling failures. Logs, alerts, and simple reporting tools can keep things running smoothly and make debugging easier. If the agent is part of a larger workflow, it should fit in naturally so users don’t have to go out of their way to interact with it.
For a tutorial that will show you how to setup programmatic email sending with Python, feel free to check out this tutorial which includes instructions on how to send emails in an agent setup .
💡
For better monitoring and troubleshooting, tools like Weave can help track the agent’s activity. Logging API calls and interactions provides visibility into how the system is working and makes it easier to catch issues early.

Using Weave for debugging and visualization

In this implementation, Weave is used to log all API calls made by the AI agents to OpenAI. The line weave.init("crewai-news-agent") initializes Weave tracking, which means every request to OpenAI’s API is recorded. This allows us to visualize how the framework is making each API call, providing insights into what’s happening under the hood.
One of the biggest challenges when working with multi-agent frameworks like CrewAI is debugging. When something goes wrong- whether it’s incorrect outputs, agents not completing their tasks properly, or unexpected failures - it’s not always clear where the issue originates. The problem could stem from the LLM's response, incorrect task definitions, poor agent coordination, or external tool failures.
Weave provides real-time visibility into your AI agent’s behavior by logging API calls and responses. This allows you to:
  • Trace errors: Identify whether failures stem from bad prompts, incorrect search results, or LLM inconsistencies.
  • Refine queries: Adjust search parameters based on logged interactions.
  • Optimize responses: Use logged outputs to fine-tune prompt instructions.
For example, if the AI Story Investigator is generating low-quality summaries, we can check whether the issue comes from the search tool returning bad results, the prompt being poorly structured, or the LLM itself struggling with coherence. Instead of guessing, we can use Weave’s logs to step through the agent’s workflow and see where things go wrong.
This kind of logging and visualization is essential for working with autonomous agents, especially as they become more complex. Without it, debugging is much harder, and it’s easy to get lost when troubleshooting multi-step processes. Weave makes it easier to understand, refine, and improve AI agent behavior based on real-world data.

Using Weave isn’t just helpful during development, it’s also valuable in production after deployment. Once an AI agent is live and being used by real users, it becomes even more important to understand how it's performing in real-world scenarios. Weave provides ongoing visibility into how users interact with the agent, what kinds of queries it’s handling, and where it might be struggling or failing.
For example, if users repeatedly ask for information that the agent isn’t retrieving correctly, Weave’s logs can reveal whether the issue is due to incomplete search queries, poor prompt design, or limitations in the LLM itself. If an agent is generating inaccurate summaries, Weave can help determine whether it's due to bad source selection, hallucinated content, or weak reasoning in the model’s responses. Instead of waiting for users to report issues or blindly tweaking parameters, you can diagnose problems in real-time by reviewing the exact API calls and responses.
This makes Weave a kind of modern debugger for GenAI applications, allowing developers to track an agent’s decisions step-by-step. In a production setting, it also helps monitor agent behavior over time, ensuring that performance doesn’t degrade as AI models evolve. If a new version of an LLM changes how it responds to prompts, or if search APIs start returning lower-quality results, you can catch these issues early.
With logging and monitoring in place, AI agents can be continuously improved based on real usage, making them more reliable and better suited to user needs.

Testing and evaluation

Building AI agents isn’t just about writing code, it’s about ensuring they perform consistently in real-world scenarios. Regular monitoring and evaluation help refine outputs, catch inconsistencies, and improve overall reliability. Since AI workflows don’t follow strict rules like traditional software, understanding how an agent behaves across different tasks is key to making it more effective.
Tools like Weave provide deeper visibility by tracking API calls, logging interactions, and helping debug unpredictable model behavior. Weave also includes an evaluation framework that makes it easier to manage experiments, benchmark different agents, and compare performance over time. Since AI agents require ongoing adjustments - whether through prompt tuning, better data sources, or refined task delegation - continuous evaluation ensures they remain efficient and aligned with their intended purpose.

Conclusion

AI agents represent a shift in how automation is designed and implemented. Unlike traditional systems that rely on rigid rules, AI agents adapt, reason, and interact dynamically with data and external tools. This flexibility enables them to handle complex, evolving tasks such as real-time news tracking, research, and summarization. However, their effectiveness depends on careful design, continuous evaluation, and integration with monitoring tools.
Building a reliable AI agent isn’t a one-time task - it’s an iterative process of refining instructions, testing outputs, and optimizing decision-making. Tools like Weave make this process more manageable by providing visibility into how agents operate, where they succeed, and where they fail. This level of insight is necessary for maintaining accuracy, improving efficiency, and ensuring agents perform as expected in real-world conditions.
As AI systems become more integrated into workflows, the need for structured evaluation and debugging will only grow. The more an agent is tested and refined, the more valuable it becomes. By defining clear goals, structuring workflows, and continuously refining performance, AI agents can move beyond simple automation. With tools like Weave for real-time monitoring, they evolve into powerful, adaptive assistants that streamline decision-making and enhance productivity.


Iterate on AI agents and models faster. Try Weights & Biases today.