LLM apps: Evaluation

LLM apps: Evaluation
Develop techniques for building, optimizing, and scaling AI evaluators with minimal human input. Learn to build reliable evaluation pipelines for LLM applications by combining programmatic checks with LLM-based judges.
2 Hours
Free

Learnings & outcomes

At the end of the course you will:

  • Understand key principles, implementation methods, and appropriate use cases for LLM application evaluation
  • Learn how to create a working LLM as a judge
  • Align your auto-evaluations with minimal human input

Curriculum

  • Welcome to the course
  • Evaluation basics
  • Programmatic and LLM Evaluations
  • Alignment
  • Case study: Google: Imagen and Veo
  • Case study: Open Hands
  • Automatic LLM Evaluators
  • Conclusion and course assignment
In partnership with
In partnership with
Reviews
Extremely Valuable and Well-Structured Course!
This course exceeded my expectations in every way. It provided a perfect balance between theory and hands-on practice, allowing me to deeply understand how to evaluate LLM applications effectively. The real-world examples and step-by-step projects made complex topics like BLEU, ROUGE, F1 Score, and human evaluation strategies feel approachable and actionable. I especially appreciated the focus on building practical workflows using tools like Weights & Biases, and the constant emphasis on scalability and real-world application. Highly recommend this course to anyone serious about working with LLMs!
Good collection of information about LLM Evaluation.
Even though it still have some of the limitation of hard to evaluate the subjective question, it still be a good starting point to have a LLM evaluation system that can automatically evaluate LLM application automatically
Great learning at weights and biases.
It was a great learning opportunity and at the time when AI is at its peak its really helpful to know more about it.
Course instructors

Ayush Thakur

Weights & Biases
AI Engineer
Ayush Thakur is an AI Engineer at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 2 years he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.
Explore our other courses