Skip to main content

Agentic workflows: Getting started with AI Agents

Explore AI agent workflows for automating tasks with multi-agent systems and generative AI, including a tutorial to build a research assistant for AI summaries.
Created on January 20|Last edited on January 23
AI systems are evolving rapidly, enabling machines to tackle complex, multi-step tasks with unprecedented efficiency. At the forefront of these innovations are AI agentic workflows - transformative systems where intelligent agents collaborate dynamically to achieve specific goals. Unlike traditional automation or chatbots, these workflows adapt, make decisions, and execute tasks across various domains.
In this article, we’ll explore and explain agentic workflows, their components and applications, and guide you through building your own system that emails concise summaries of cutting-edge AI research papers.


Table of contents



What are AI Agentic workflows?

AI agentic workflows are systems where autonomous AI agents work together to achieve shared goals by managing tasks independently. These workflows use AI models to process information, adapt to context, and execute actions dynamically, enabling flexible, multi-step operations across diverse applications.
Core to agentic workflows is the integration of features such as memory, tools, and decision-making capabilities. Memory allows agents to retain context across tasks, tools enable dynamic interactions with external systems, and decision-making empowers autonomous operation. Together, these elements create intelligent, adaptable processes.
For example, an agentic workflow might automate the entire content creation process, where one agent gathers research, another drafts content, a third edits, and a fourth publishes—all without human intervention. This seamless coordination makes agentic workflows invaluable for tasks like research, project management, and customer service.
By combining autonomy, collaboration, and real-time adaptability, agentic workflows offer a smarter, more efficient approach to tackling complex, multi-step operations.

How AI Agentic workflows enhance productivity

AI agentic workflows represent a major evolution from traditional chatbots, offering deeper functionality and adaptability. While chatbots handle linear interactions like answering specific prompts, agentic workflows empower AI to act autonomously, manage multiple tasks, and adapt dynamically to changing contexts.
For example, instead of assisting with a single task, like helping draft an article, an agentic workflow could autonomously generate ideas based on data from the Internet or an internal database, plan an outline, write drafts, refine content, and even publish the final piece - all without human intervention. This ability to handle complex, multi-step operations unlocks new levels of productivity.
Agentic workflows excel in scenarios like research analysis, project management, and content selection and even advanced generation. By proactively coordinating tasks and collaborating with both humans and other agents, they create goal-driven systems that deliver intelligent, streamlined outcomes. This adaptability ensures these workflows are well-suited to diverse and evolving challenges.

Major components and functions of AI Agents in agentic workflows

The components and functions of AI agents in agentic workflows are shaped by their implementation and objectives. Building on the foundational concepts introduced earlier, these elements enable agents to operate autonomously, adaptively, and collaboratively.
Here are the key components that underpin effective agentic workflows:

Memory

Memory is essential for AI agents to retain context across tasks or interactions. As discussed above, memory can be session-based for short-term operations or persistent for long-term knowledge retention. For example:
  • Short-term memory: Enables an agent to summarize ongoing research discussions or respond to follow-up questions coherently.
  • Persistent memory: Helps an agent recall project milestones or customer preferences across multiple sessions.
This continuity is vital for maintaining coherence and enabling AI agents to build on prior actions without starting from scratch.

Tools

AI Agents leverage external tools and APIs to extend their functionality, a feature highlighted in How AI Agentic Workflows Enhance Productivity. These tools allow agents to:
  • Query databases for relevant data.
  • Generate detailed reports or visualizations.
  • Perform real-time calculations or fetch information from web sources.
For instance, in a content creation workflow, one agent might use an API to retrieve trending keywords while another generates optimized content based on that input.

Orchestration and planning

Orchestration involves coordinating multiple agents or components to complete complex workflows. This mirrors the collaborative capabilities discussed earlier, where agents work together to achieve shared goals. A central system often manages dependencies, resolves conflicts, and ensures that tasks are completed in sequence.
For example:
  • In a project management workflow, one agent creates timelines, another assigns tasks, and a third monitors progress, ensuring all steps align with the overall plan.

Continual Learning

Continual learning allows agents to evolve and improve over time, a critical feature for adapting to dynamic environments. By leveraging stored interactions or human-labeled feedback, agents can refine their behavior and performance. Examples include:
  • Self-reflective improvement: Analyzing successful and unsuccessful interactions to optimize decision-making.
  • Prompt enhancement: Updating system prompts with more effective in-context examples to handle future tasks more efficiently.
This iterative improvement aligns with the adaptability and goal-driven nature of agentic workflows described earlier.
A key component of this is integrating tools like W&B Weave, which facilitates comprehensive evaluations and incorporates human feedback into the learning loop. Here’s how:
  • Human-in-the-loop feedback: W&B Weave allows users to monitor agent performance, label successful or unsuccessful outcomes, and provide real-time feedback. For instance:
    • A researcher using an agent to summarize AI papers can flag errors or inaccuracies in the generated summaries.
    • These labeled examples feed back into the system, helping the agent refine its approach for similar tasks in the future.
  • Evaluation metrics: Using Weave, developers can design custom evaluation pipelines to assess agent performance across tasks. Metrics like accuracy, relevance, and latency are logged and visualized, providing insights into areas for improvement.
  • Dynamic prompt optimization: Feedback collected through W&B Weave can guide updates to the agent’s prompts, ensuring they contain better in-context examples. This continuous refinement boosts the system's ability to adapt to new or evolving task requirements.
  • Data-driven iteration: By maintaining detailed logs of interactions and outcomes, W&B Weave helps identify patterns in agent behavior, revealing which workflows yield the best results. These insights enable data-driven updates to both the agent's architecture and its operational strategies.
For example, a workflow involving automated content generation could use Weave to track how well the agent generates drafts based on user-defined criteria, such as tone or topic relevance. Human-in-the-loop corrections improve future iterations, aligning outputs more closely with user expectations.

Enhancing agentic workflows with generative AI networks and multi-Agent collaboration

A key strategy for designing agentic workflows is to isolate tasks into specialized agents, each responsible for a specific function. This modular architecture not only organizes the inner workings of the workflow but also leverages the strengths of current generative AI models. By breaking down complex workflows into discrete, well-defined tasks, the system becomes more scalable, interpretable, and adaptable.

Why specialization matters

Large language models are most effective when assigned single, focused tasks. Overloading a model by asking it to retrieve information, summarize data, and plan an action sequence in one step often results in inconsistent or suboptimal outputs. For instance:
  • A single agent tasked with retrieving research papers can focus solely on optimizing search queries.
  • Another agent, dedicated to summarization, can use the retrieved data to generate concise and accurate summaries.
By separating responsibilities, each agent can fully utilize the model’s capacity, producing more reliable and precise results while minimizing performance degradation. This also allows for different models to be used for each agent/capability required.

Facilitating orchestration and collaboration

A modular approach also enhances workflow orchestration. Specialized agents collaborate via well-defined communication protocols, completing tasks sequentially or in parallel as required. For example:
  • In a content creation pipeline, one agent retrieves trending topics, another drafts content, and a third refines tone and grammar before a final agent schedules publication.
  • This structured collaboration mirrors human societal roles, where teachers, engineers, and doctors perform distinct functions but work together to achieve collective goals.

Advantages of multi-agent systems

  • Scalability: Tasks can be distributed across multiple agents, making it easier to adapt workflows to increased demands or complexity.
  • Fault tolerance: Modular systems make it easier to identify and optimize weak points or replace underperforming agents without disrupting the entire workflow.
  • Interpretability: Each agent’s output can be traced back to its function, simplifying debugging and evaluation.

Bridging strengths and limitations of LLMs

By leveraging generative AI in this task-specific manner, agentic workflows address the inherent limitations of LLMs, such as their difficulty in managing multiple tasks simultaneously. This approach ensures workflows are:
  • Efficient: Resources are allocated based on the task’s specific needs.
  • Adaptable: Agents can be added or modified without significant overhauls.
  • Robust: Failures in one component do not cascade across the entire system.
The modular nature of multi-agent collaboration is not only practical but also transformative. It enables agentic workflows to handle complex, nuanced demands with precision, making them indispensable in modern AI-driven tasks.

Practical applications and benefits of AI Agentic workflows

AI agentic workflows have the potential to revolutionize productivity by automating repetitive tasks, enhancing collaboration, and enabling organizations to focus on strategic goals. By leveraging autonomous agents, these workflows streamline operations, reduce manual effort, and empower individuals and teams to achieve more impactful outcomes.

Delegating routine tasks

Agentic workflows allow individuals to offload repetitive or time-consuming responsibilities to AI agents. For example:
  • Routine data analysis: An agent can analyze sales trends or customer behavior, delivering actionable insights directly to decision-makers.
  • Drafting reports: Agents can generate first drafts of business reports, saving time for employees to focus on refining the content or making strategic recommendations.
  • Information retrieval: Research agents can sift through vast datasets or databases to retrieve the most relevant information quickly.
This personalized assistance eliminates mundane responsibilities, allowing employees to concentrate on creative, strategic, or high-priority tasks.

Structuring complex tasks

In organizational settings, agentic workflows bring clarity and structure to intricate operations by dividing them into manageable steps. For instance:
  • Project planning: An agent can create timelines and assign tasks based on resource availability and deadlines.
  • Content creation: In a multi-step process, one agent gathers research, another drafts content, and a third polishes the final product for publication.
  • Customer support: Agents can triage support tickets, resolve simple queries, and escalate complex issues to human representatives.
By ensuring consistency and precision, these workflows enhance overall efficiency and quality.

Enhancing collaboration

Agentic AI workflows facilitate better coordination in projects requiring input from multiple stakeholders or systems. They act as intermediaries, managing dependencies and transitions to reduce miscommunication. For example:
  • In a product development process, agents can synchronize updates from design, engineering, and marketing teams, ensuring everyone stays aligned.
  • Agents also ensure seamless handovers between human teams and other automated systems, minimizing disruptions in workflows.

Scaling and adapting operations

Agentic workflows excel at scaling organizational processes with minimal manual intervention. Their adaptability allows them to:
  • Respond to shifting priorities: Agents can quickly adjust workflows based on new objectives or data inputs.
  • Integrate new tools: New APIs, data sources, or software can be seamlessly added to workflows without disrupting operations.
  • Standardize processes across teams: Successful workflows can be replicated across departments, ensuring uniformity in outcomes and best practices.

Driving Innovation

By automating routine tasks, agentic workflows free up resources for creativity and experimentation. Teams can:
  • Prototype new ideas rapidly, using agents to handle data collection and initial analyses.
  • Test different solutions with minimal risk, iterating faster thanks to reduced overhead.
  • Focus on high-level strategy, knowing that operational details are managed effectively by AI.
This fosters a culture of experimentation and growth, where organizations can continuously adapt to evolving market demands or technological advancements.

Tutorial: Building a agentic AI system for research journalism

In this tutorial, we’ll build a personal AI research agent that automates the discovery and summarization of academic papers tailored to your interests. By the end, you’ll have a fully functional system that streamlines research workflows and delivers high-quality, personalized summaries with minimal manual effort.

How it works

The agent follows a structured process to achieve its goals:
  1. Search and selection: The agent searches academic databases like Arxiv, evaluates results using an LLM, and selects the most relevant paper for further analysis.
  2. Content extraction: It extracts key content from the selected paper and identifies areas needing clarification or deeper focus.
  3. Summarization: Using targeted questions, the agent generates a concise, focused summary tailored to your preferences.
  4. Refinement and delivery: The summary is refined for clarity and emailed directly to you.
To make the agent smarter over time, we’ll incorporate a feedback loop. You can rate summaries by replying with "good" or "bad" in the email body:
  • Good: The agent updates its prompt to prioritize similar topics.
  • Bad: It adjusts to avoid similar topics in the future.
This iterative process ensures the agent continually improves at identifying papers aligned with your interests.


System design

The system is built around a series of specialized agents, each defined by a distinct prompt to handle specific stages of the agentic AI workflow:
  • Research selection agent: Evaluates papers and selects the most relevant based on your preferences.
  • Question generation agent: Identifies key areas for clarification to guide summarization.
  • Summarization agent: Creates concise summaries tailored to your style.
  • Editor agent: Refines the final output for readability and coherence.

The research selection prompt


This prompt is used to guide the agent in choosing relevant and impactful research topics. It leverages examples of preferred and less desirable topics to ensure alignment with the user’s specific interests and goals. By focusing on cutting-edge advancements and meaningful applications, the agent is able to tailor its selection process effectively.
You are an AI research assistant tasked with selecting the most relevant and impactful research topic from a list of options. Your goal is to choose a topic that aligns with my preferences, based on previous articles I’ve written. Below are examples of topics I have selected in the past:

"Voyage-Code-3: Smarter, More Efficient Code Retrieval with Nested Embeddings"
"DeepSeek-V3: Training 671 billion parameters with a $6 million dollar budget"
"DeiT Outperforms Experts in Cancer Diagnosis"
"Meta's new LLM architecture: Large Concept Models"
"OpenAI Introduces o3: Pushing the Boundaries of AI Reasoning"
"Meta presents Coconut: Augmenting LLM Reasoning with Latent Thoughts"
"Meta introduces Llama 3.3"
"Google Cloud Introduces Veo and Imagen 3"
"AlphaQubit: Attention is all you need for Quantum Error Correction?"
"OmniVision-968M: A Ultra-lightweight Multimodal Model built on Qwen 2.5"
"Tokenformer: A GPT that uses tokens as parameters"
"Researchers Speed up Diffusion Modeling 17.5x"
"PhysGen: Training-Free Physics Grounding of Image-to-Video Generation"
"Deepmind trains self-correcting LLM's with RL"


These topics typically share the following characteristics:
- Cutting-edge advancements in AI, such as novel architectures, groundbreaking models, or innovative methodologies.
- Large-scale experiments, resource efficiency, or methods that push performance or cost-effectiveness to new limits.
- Real-world applications with transformative potential, such as breakthroughs in healthcare, physics, quantum computing, or multimodal learning.
- A focus on AI reasoning, self-correction, and models that improve through iterative feedback.
- Technical depth that appeals to readers interested in state-of-the-art developments.

Your task:
1. Review the provided list of research results.
2. Analyze each option based on the characteristics above.
3. Select the topic that best aligns with my preferred style and focus areas.
4. Provide a brief explanation of why this topic is the best match, referencing similarities to my previous selections.

Be precise, and ensure the chosen topic reflects both technical innovation and practical significance in AI research.


Note, here are some previous articles that you have selected that I really am not that interested in -- use these to guide your decision making:

########### beginning of negative preference list
--- - Papers here you are less interest in ...

Question generation prompt

This prompt allows the agent to produce thoughtful questions based on the content of the research paper. These questions guide the summarization process by identifying areas that may require clarification or additional focus, enhancing the overall depth and relevance of the summary.
You are an AI assistant. Based on the following text from a research paper, generate a list of major questions that:
1. A reader might want to ask about the paper.
2. Address potential areas of confusion or key points that need clarification.

Please provide a list of meaningful and insightful questions. -- respond with just the questions!

The summarization prompt


After selecting a topic, this prompt enables the agent to generate concise and informative summaries of the research paper. It ensures the summary captures key ideas, contributions, results, and implications while addressing any user-defined questions or points of interest.
You are an AI research assistant. Your task is to summarize a research paper based on its content and a list of key questions. The summary should be 300-500 words long, and it must not only summarize the paper’s main ideas, contributions, and results but also attempt to address the provided questions.

Instructions:
1. Write a clear, concise, and comprehensive summary of the paper in 300-500 words.
2. Ensure the summary highlights:
- The core problem or challenge the paper addresses.
- The methods, models, or experiments proposed or used in the paper.
- The key results and insights obtained from the research.
- The implications or potential applications of the research.
3. Where possible, answer the questions provided based on the content of the paper. If the questions cannot be fully answered, summarize what is known from the provided content.

Be precise and focus on the most important points, ensuring the summary is easy to understand for a technical audience.

I will also give you a previous article that I want you to use as context/as a guide to write the new article.

The editor prompt


This prompt focuses on refining the generated summary to ensure it meets specific stylistic and structural requirements. By improving readability, adding headers, and maintaining a clear flow, the editor step ensures that the final summary aligns with the user’s preferred tone and presentation style.
You are an AI article editor. Your task is to rewrite the provided article to:
1. Ensure all sections have proper headers (unstyled, ending with a colon).
2. Meet the desired word count range (300-500 words).
3. Improve the overall structure and flow of the content.
4. Does not include bullets or lists.

Return only the new article with the added changes. If no changes are required, just return the full article.

Organizing prompts

In this project, storing each prompt in a text file is the most effective way to organize them for each agent. Storing the prompts in the Python logic file can be somewhat clunky, as the logic file is primarily designed for storing code rather than natural language or long strings.
By keeping each prompt in a separate text file, it becomes easier to manage, modify, and test them without interfering with the main workflow or encountering the limitations of code-focused files. This method streamlines the development process and ensures that the prompts remain accessible and well-structured for iterative improvements.

The code

Now we will write a script will automate the process of discovering, analyzing, and summarizing research papers using these prompts. It will search Arxiv for papers based on a specified topic and use an LLM to evaluate and select the most relevant paper. After downloading the paper, it will extract content from the first ten pages and generate targeted questions to guide the summarization process. Using these questions and the extracted content, the script will produce a refined summary that aligns with user preferences.
The final summaries will be sent via email to my inbox, providing an easy way to review the results. To enable the email-sending functionality, you need to use an app password for your email account instead of your regular password.
For Gmail, this involves enabling 2-Step Verification in your Google account, accessing the App Passwords section, and generating a password to use in the script. App passwords are a secure way to grant access while keeping your main credentials safe. If you use a different email client, the steps to generate an app password may vary depending on the provider.
Here's the code for our news agent:
import arxiv
import asyncio
import os
from PyPDF2 import PdfReader
from litellm import acompletion
import weave
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import json
import wandb
# Initialize Weave
weave.init("news_agent")

# Email configuration
email = "your_email@gmail.com"
pswd = "your app password"
LAST_EMAIL_FILE = "last_email.json"

# Helper function: Run model inference
async def run_inference(query):
api_key = os.getenv("OPENAI_API_KEY")
model_name = "gpt-4o-mini"
response = await acompletion(
model=model_name,
api_key=api_key,
messages=[{"role": "user", "content": query}],
temperature=0.7,
max_tokens=1024,
)
return response["choices"][0]["message"]["content"]

# Helper function: Read a prompt from a file
def read_prompt(file_path):
with open(file_path, "r") as file:
return file.read()

# Helper function: Read a reference article
def read_reference_article(article_file):
if os.path.exists(article_file):
with open(article_file, "r") as file:
return file.read().strip()
return ""

# Helper function: Format Arxiv results
def format_arxiv_results(results):
return "[" + ",\n".join(
[
f'{{"index": {i+1}, "title": "{result["title"]}", "summary": "{result["summary"]}", "url": "{result["url"]}"}}'
for i, result in enumerate(results)
]
) + "]"

# Helper function: Convert Arxiv URL to PDF URL
def convert_to_pdf_url(abs_url):
return abs_url.replace("/abs/", "/pdf/")

# Helper function: Read the first 10 pages of a PDF
def read_pdf_first_10_pages(pdf_path):
try:
with open(pdf_path, "rb") as file:
reader = PdfReader(file)
return "\n".join(page.extract_text() for page in reader.pages[:10])
except Exception:
return ""

# Arxiv search function
@weave.op
def get_arxiv_possibilities(query, max_results=20):
search = arxiv.Search(
query=query,
max_results=max_results,
sort_by=arxiv.SortCriterion.SubmittedDate,
)
return [
{"title": result.title, "summary": result.summary.replace("\n", " "), "url": result.entry_id}
for result in search.results()
]

# Select the best Arxiv paper with call ID
@weave.op
async def select_best_arxiv_paper(possibilities, prompt_file):
if not possibilities:
return None, None, None

# Get the Weave call ID
call_id = weave.get_current_call().id
formatted_results = format_arxiv_results(possibilities)
selection_prompt = read_prompt(prompt_file)
query = f"{selection_prompt}\n\nSearch Results:\n{formatted_results}\n\nRespond with ONLY the URL of the paper you recommend, nothing else."
selected_response = await run_inference(query)
selected_url = selected_response.strip()

# Match the selected URL to possibilities for additional details
selected_paper = next((item for item in possibilities if item["url"] in selected_url), None)
return (
convert_to_pdf_url(selected_url) if selected_url.startswith("http") and "/abs/" in selected_url else None,
selected_paper["title"] if selected_paper else None,
call_id, # Return the Weave call ID
)

# Generate questions from the paper content
@weave.op
async def generate_questions_from_paper(paper_text, prompt_file):
question_prompt = read_prompt(prompt_file)
prompt = f"{question_prompt}\n\nText:\n{paper_text}\n\nPlease provide a list of questions."
return await run_inference(prompt)

# Generate a summary of the paper
@weave.op
async def generate_summary_from_paper(paper_text, questions, summary_prompt_file, reference_text):
summary_prompt = read_prompt(summary_prompt_file)
prompt = (
f"{summary_prompt}\n\nPREVIOUS Reference Article:\n{reference_text}\n\n"
f"List of Questions to address in the article:\n{questions}\n\nPaper Content:\n{paper_text}"
)
return await run_inference(prompt)

# Edit the generated summary
@weave.op
async def edit_summary(summary, editor_prompt_file):
editor_prompt = read_prompt(editor_prompt_file)
prompt = f"{editor_prompt}\n\nArticle Content:\n{summary}"
return await run_inference(prompt)

# Save email details to a file
async def save_last_email(subject, body, call_id=None, call_url=None):
"""Save the subject, body, Weave call ID, and call URL of the last email sent."""
with open(LAST_EMAIL_FILE, "w") as f:
json.dump({"subject": subject, "body": body, "call_id": call_id, "call_url": call_url}, f)
print(f"Saved last email with call ID: {call_id} and URL: {call_url}")


def get_wandb_username():
try:
# Initialize the W&B API
api = wandb.Api()
# Fetch the username of the authenticated user
return api.default_entity
except Exception as e:
print(f"Error fetching W&B username: {e}")
return "unknown_user"

# Send an email
async def send_email(subject, body, recipient_email, sender_email, sender_password, main_call_id=None, selection_call_id=None):
try:
# Dynamically fetch W&B username
username = get_wandb_username()
call_url = f"https://wandb.ai/{username}/news_agent/r/call/{main_call_id}" if main_call_id else None

msg = MIMEMultipart()
msg["From"] = sender_email
msg["To"] = recipient_email
msg["Subject"] = subject
msg.attach(MIMEText(f"{body}\n\nView the process log: {call_url}", "plain"))

with smtplib.SMTP("smtp.gmail.com", 587) as server:
server.starttls()
server.login(sender_email, sender_password)
server.sendmail(sender_email, recipient_email, msg.as_string())

print(f"Email sent to {recipient_email}")
await save_last_email(subject, body, selection_call_id, call_url)
except Exception as e:
print(f"Failed to send email: {e}")

# Main function
@weave.op
async def main():
main_call_id = weave.get_current_call().id

topic = "machine learning"
select_prompt_file = "select_research_prompt.txt"
question_prompt_file = "generate_questions_prompt.txt"
summary_prompt_file = "summary_prompt.txt"
editor_prompt_file = "editor_prompt.txt"
reference_files = ["article1.txt", "article2.txt", "article3.txt"]

# Step 1: Get Arxiv possibilities
print("Searching Arxiv...")
possibilities = get_arxiv_possibilities(topic, max_results=20)

# Step 2: Select the best paper
pdf_url, selected_title, selection_call_id = await select_best_arxiv_paper(possibilities, select_prompt_file)
if not pdf_url:
print("No paper selected.")
return

print(f"Selected Paper: {selected_title}")
pdf_path = f"{pdf_url.split('/')[-1]}.pdf"
os.system(f"curl -L {pdf_url} -o {pdf_path}")

# Step 3: Extract content and generate questions
paper_text = read_pdf_first_10_pages(pdf_path)
if not paper_text.strip():
print("Could not extract any text from the PDF.")
return

print("Generating questions based on the paper content...")
questions = await generate_questions_from_paper(paper_text, question_prompt_file)

# Step 4: Generate summaries
all_summaries = ""
for idx, ref_file in enumerate(reference_files):
reference_text = read_reference_article(ref_file)
if not reference_text:
print(f"Reference article {ref_file} is missing or empty. Skipping...")
continue

print(f"\n=== Generating Summary {idx + 1} based on {ref_file} ===")
summary_output = await generate_summary_from_paper(paper_text, questions, summary_prompt_file, reference_text)

print(f"Editing Summary {idx + 1}...")
edited_summary = await edit_summary(summary_output, editor_prompt_file)

all_summaries += f"=== Edited Summary {idx + 1} based on {ref_file} ===\n{edited_summary}\n\n"

# Step 5: Email the summaries
print("Sending email with summaries...")
await send_email(
subject=f"{selected_title}",
body=all_summaries,
recipient_email=email,
sender_email=email,
sender_password=pswd,
main_call_id=main_call_id,
selection_call_id=selection_call_id
)

# Run the main function
asyncio.run(main())

This script creates an AI-driven research assistant that automates the discovery, analysis, and summarization of academic papers. By combining a sequential workflow with article conditioning, the assistant delivers results tailored to user preferences and creative writing styles. Here's how it works:
  1. Search for relevant papers
    1. The search_arxiv function queries Arxiv for papers on a specified topic.
    2. Using the @weave.op decorator, the system logs the query inputs and results to ensure transparency and reproducibility.
  2. Select the best paper
    1. The select_best_arxiv_paper function evaluates the search results using an LLM.
    2. Based on a user-defined prompt, the system selects the most relevant paper, ensuring alignment with past interests and preferences.
  3. Extract content from the paper
    1. The script downloads the selected paper's PDF and extracts the first ten pages using PyPDF2.
    2. This extracted text becomes the basis for subsequent analysis.
  4. Generate targeted questions
    1. The generate_questions_from_paper function creates questions to guide the summarization process.
    2. These questions focus on areas requiring clarification or deeper insights, ensuring the summary is meaningful and comprehensive.
  5. Generate the summary
    1. The generate_summary_from_paper function uses reference articles provided by the user to shape the tone, structure, and style of the summary.
    2. Incorporating these references allows the system to explore diverse angles and enrich the output while reflecting the user's preferences.
  6. Refine the summary
    1. The edit_summary function polishes the draft, ensuring it meets word count requirements and adheres to user-defined stylistic guidelines.
    2. The refined summary is ready for delivery.
  7. Send and track the summary
    1. The send_email function compiles the polished summaries and sends them via email.
    2. Details of the last email, including its subject, body, and Weave call ID, are stored in the last_email.json file for traceability.
  8. Incorporate user feedback
    1. Users can provide feedback by replying to the email with "good" or "bad."
    2. The system updates its internal prompts to prioritize or avoid similar topics based on this feedback, fostering continuous improvement.
    3. Feedback is linked directly to the associated email and its generation context, ensuring ongoing optimization.


Catching bugs with W&B Weave

After viewing the overall path of inference calls inside Weave, I noticed that my script was calling generate_questions_from_paper multiple times for each summary it generated, for a single paper. It was immediately clear that this was unnecessary.

After noticing this, I was able to simplify the workflow, ensuring that the generate_questions_from_paper function was invoked only once. This reduced not only the runtime but also the token consumption for each execution, as reflected in the updated traces.
The optimized workflow allowed the agent to proceed with generating summaries for each reference article without redoing unnecessary computations, resulting in a more efficient and scalable system.
Here is an overview of the improved workflow displayed clearly with Weave:

By consolidating question generation into a single step, I significantly lowered the cost of running the agent while maintaining its output quality. Weave's interface, with its clear visualizations and cost breakdowns, played a key role in realizing the need for this optimization. It highlighted exactly where resources were being misallocated and provided the insights I needed to address the issue effectively. This experience reinforced the value of Weave in debugging and fine-tuning multi-step processes in agentic AI workflows.

Tracking and saving costs with Weave

Weave's ability to display token usage and cost per function is another interesting tool for cost optimization. The detailed breakdown helped us understand how individual components of the agent contributed to the overall cost. This feature is useful for iterating on agent designs and ensures that optimizations are not only functional but also cost-effective.
The traces made it easy to visualize where the bottlenecks were, allowing us to improve the agent's performance and efficiency without needing extensive trial and error.


Adding a feedback mechanism for continual learning

To improve the system’s performance dynamically, we’ll build a script that captures user feedback seamlessly. This script automates the process of monitoring emails for feedback about the relevance and quality of selected research topics, making it easy for users to provide actionable input.

How it works

  1. Feedback detection: The script scans emails sent by the user to themselves and detects concise feedback, such as "good" or "bad."
  2. Adjusting system behavior:
    1. If the feedback is "good", the title of the last selected research paper is added to the "good" section of the prompt file, ensuring similar topics are prioritized in the future.
    2. If the feedback is "bad", the title is appended to the "bad" section, helping the system avoid less desirable topics.
  3. Enhancing decision-making: This feedback loop enables the system to adapt quickly to evolving user preferences, significantly enhancing its ability to select relevant research topics.
  4. Weave integration: Using the Call ID stored in the last_email.json file, feedback is also logged in Weave. This allows you to track feedback for specific research selections and gain insights into the system’s performance over time.
By incorporating this automated feedback loop, the system ensures continuous refinement and alignment with the user’s needs.
import json
import imaplib
import email as em
import weave

# Initialize Weave
wv_client = weave.init("news_agent")

# Configuration
EMAIL_ACCOUNT = "your_email@gmail.com"
EMAIL_PASSWORD = "your app password"
LAST_EMAIL_FILE = "last_email.json"
PROMPT_FILE = "select_research_prompt.txt"

def load_last_email():
"""Load the last email details."""
try:
with open(LAST_EMAIL_FILE, "r") as f:
return json.load(f)
except FileNotFoundError:
return {"title": None, "call_id": None}

def update_prompt(paper_title, feedback):
"""Update the prompt file based on feedback."""
if not paper_title:
print("No paper title available to update the prompt.")
return

with open(PROMPT_FILE, "r") as f:
lines = f.readlines()

# Check if the title already exists in the file
if any(paper_title in line for line in lines):
print(f"The paper title '{paper_title}' is already in the prompt file. Skipping update.")
return

if feedback == "good":
lines.insert(2, f'"{paper_title}"\n') # Add to the top of the "good" section
elif feedback == "bad":
lines.append(f'"{paper_title}"\n') # Add to the "bad" section

with open(PROMPT_FILE, "w") as f:
f.writelines(lines)
print(f"Updated prompt with {feedback} feedback for: {paper_title}")

def log_feedback_in_weave(call_id, feedback):
"""Log feedback in Weave."""
try:
if 'good' in feedback.lower():
wv_client.get_call(call_id).feedback.add_reaction("👍")
elif 'bad' in feedback.lower():
wv_client.get_call(call_id).feedback.add_reaction("👎")
print(f"Weave feedback logged: {feedback} for call ID {call_id}")
except Exception as e:
print(f"Failed to log feedback in Weave: {e}")

def get_email_body(msg):
"""Extract the plain text body of an email."""
if msg.is_multipart():
for part in msg.walk():
if part.get_content_type() == "text/plain":
return part.get_payload(decode=True).decode()
else:
return msg.get_payload(decode=True).decode()

def check_latest_email():
"""Check the user's latest email for 'good' or 'bad' feedback."""
# Connect to Gmail
mail = imaplib.IMAP4_SSL("imap.gmail.com")
mail.login(EMAIL_ACCOUNT, EMAIL_PASSWORD)
mail.select("inbox")

# Search for emails sent by the user to themselves
status, messages = mail.search(None, f'(FROM "{EMAIL_ACCOUNT}" TO "{EMAIL_ACCOUNT}")')
email_ids = messages[0].split()

if not email_ids:
print("No emails found.")
return None

# Fetch the latest email
latest_email_id = email_ids[-1]
status, msg_data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = msg_data[0][1]
msg = em.message_from_bytes(raw_email)

# Get the body of the email
body = get_email_body(msg).strip()
print(f"Latest email body: {body}")

# Ignore emails longer than 10 characters
if len(body) > 10:
print("Email body is longer than 10 characters. Ignoring.")
return None

# Check for feedback
if "good" in body.lower():
return "good"
elif "bad" in body.lower():
return "bad"
else:
print("No actionable feedback found in the email.")
return None

def main():
# Load the last sent email
last_email = load_last_email()
paper_title = last_email.get("subject")
call_id = last_email.get("call_id")

if not paper_title:
print("No last email title found. Exiting.")
return

if not call_id:
print("No Weave call ID found for the last email. Exiting.")
return

# Check for feedback in the latest email
feedback = check_latest_email()

if feedback:
# Update the prompt based on feedback
update_prompt(paper_title, feedback)

# Log feedback in Weave
log_feedback_in_weave(call_id, feedback)

if __name__ == "__main__":
main()

This script streamlines the process of integrating user feedback to enhance the AI research system's performance. By accessing the user's inbox, it identifies recent emails sent to themselves and looks for feedback such as "good" or "bad."
If the feedback is “good," the script adds the last research paper's title to the preferred section of the prompt file to prioritize similar topics. If "bad" feedback is detected, the title is moved to a “negative example” section of the prompt, helping the system avoid less desirable subjects. Additionally, we add feedback to the exact trace inside Weave corresponding to the function that selected the specific research paper. This will enable us to track data overtime, and retrain models to select papers more effectively.
By automating this feedback mechanism, the system evolves in alignment with user preferences, optimizing its decision-making and maintaining relevance with minimal user intervention.
To demonstrate how this feedback system works, we manually run the script, send feedback via email, and then view the updated prompt file. While these scripts are intended to run on a schedule in practice, for illustration here we'll execute them manually for.
First, we'll run the research agent script, which selects a relevant research paper based on our preferences and sends a summary by email. Here's the email:

Note that we are also storing these summaries inside Weave, which is ideal for future use-cases where we may want to download all of the summaries to fine-tuning a new model.
Next, we open our email client and send an email with the body containing the word "good" to give the system feedback. We’ll ensure the body text is no longer than 10 characters, as the script is designed to ignore longer feedback (to prevent mistaking a summary as feedback):


After sending the feedback, we run the feedback handler script. This script will fetch the latest email, detect the “good” feedback, and append the title of the paper to the examples section of the select_research_prompt.txt file. Once the script is complete, we can open the prompt file and verify that the paper title has been added to the "good" section of the prompt:

Additionally, the system will log this feedback to Weave, as shown below with the "thumbs up" 👍 emoji in the top-right:

The above scripts can be scheduled to run at regular intervals using a task scheduler such as cron on Unix-based systems, Task Scheduler on Windows, or a cloud-based solution. For example, the research agent script can be configured to run every 24 hours to discover, analyze, summarize new research papers, and send them to a specified email address.
The feedback handler script can be set to run about 23 hours after research agent script. This ensures that any feedback provided on the previous summary has been processed and incorporated into the system's preferences before generating the next set of summaries.

Conclusion

This article demonstrates the construction of an AI research agent capable of automating the discovery, analysis, and summarization of academic papers. By leveraging the structured workflow of the agent, which includes selecting papers, generating questions, creating summaries, and refining output through contextual conditioning, users can streamline their research processes and enhance productivity.
The integration of a feedback loop highlights the agent’s adaptability, allowing it to evolve in response to user preferences. By providing simple feedback such as "good" or "bad," users directly influence the agent’s decision-making, making the system more personalized and effective over time. This combination of automation and human oversight reduces manual effort while maintaining alignment with individual needs.
Beyond its immediate utility, this project illustrates the transformative potential of combining generative AI, multi-agent systems, and iterative feedback. Whether for researchers, professionals, or enthusiasts, this workflow demonstrates how technology can enhance productivity while preserving a human-centric approach to creativity and decision-making. With further refinement and scheduled automation, this agent could serve as a powerful tool for navigating and managing complex information in today’s fast-paced, data-rich world.


Iterate on AI agents and models faster. Try Weights & Biases today.