Announcing our newest GenAI course AI Engineering: Agents
On 2 June 2025 we opened the doors to our newest course entitled AI Engineering: Agents. If you’ve ever wondered how to get from a reliable prompt chain to a fleet of autonomous, memory‑aware GPT agents, this course is for you.
Created on June 3|Last edited on June 4
Comment
We're thrilled to release our newest course "AI Engineering: Agents" in partnership with OpenAI. Like all courses in our AI Academy, it's completely free from start to finish. You'll learn how to design reliable agent architectures combining tool integration, memory systems, and autonomous workflows, master multi-agent collaboration through orchestrator-worker patterns and structured hand-offs, evaluate agent performance across accuracy, latency, and cost using reproducible benchmarking methods, and a whole lot more. Click below to register.
Register for the free Agents course
Here's a brief run down of some core concepts and what we cover in this course:
What are Agents?
Implementing an LLM call in your application creates a fixed pattern: the user supplies input, the model follows the sequence of steps you coded, and the LLM returns the result. But what happens when those steps depend on the user’s request? In that situation an autonomous AI agent is the better choice.
In this course we define an agent as an LLM‑powered process that plans its own steps, selects and invokes external tools, and iteratively refines its work to achieve a high‑level objective without hard‑coded instructions.
Depending on your use case, you might deploy a single agent or orchestrate several working together. Each agent determines the steps it needs, makes the tool calls, and returns the output.

A scale showing a difference in control/consistency vs flexibility in Deterministic Workflow (hard coded, sinlge LLM calls) and Agentic Systems.
Will I always need a multi‑agent setup?
The first thing you’ll learn in the course is how to recognize when a simple LLM call is enough, when a standalone agent is ideal, and when a full multi‑agent pipeline is warranted. Across six modules—mixing theory with hands‑on code—you’ll explore the strengths and trade‑offs of each pattern so you can pick the right one every time.

Evaluation
We teach an evaluation‑first mindset, and agents are no exception. The challenge is deciding how to evaluate them. Think of evaluation like tailoring a suit: an off‑the‑rack benchmark is a good start, but the perfect fit comes from a custom metric aligned with your task.
Because agents chart their own paths, large‑scale evaluation is tricky. We track surface metrics—tool‑call counts, tokens, latency—and introduce the idea of using an LLM as a judge for deeper quality checks. (You’ll dive into this in Module 5!)

Module 5: Evaluation and Benchmarking - measuring, improving & trusting Agents
What you will build in the course
All lesson code is available in the course repository. After each theory section you’ll apply your new skills to a focused project. You’ll start with an email‑writing agent, create an agent with weather memory, and finish with a travel‑planning assistant: several specialized agents collaborating to design your ideal holiday.

Structure of the final Travel Assistant agent course participants will build.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.