LLM Apps: Evaluation

LLM Apps: Evaluation
Develop techniques for building, optimizing, and scaling AI evaluators with minimal human input. Learn to build reliable evaluation pipelines for LLM applications by combining programmatic checks with LLM-based judges.
2 Hours
Free

Learnings & outcomes

At the end of the course you will:

  • Understand key principles, implementation methods, and appropriate use cases for LLM application evaluation
  • Learn how to create a working LLM as a judge
  • Align your auto-evaluations with minimal human input

Curriculum

  • Welcome to the course
  • Evaluation basics
  • Programmatic and LLM Evaluations
  • Alignment
  • Case study: Google: Imagen and Veo
  • Case study: Open Hands
  • Automatic LLM Evaluators
  • Conclusion and course assignment
In partnership with
In partnership with
Reviews
Good collection of information about LLM Evaluation. Even though it still have some of the limitation of hard to evaluate the subjective question, it still be a good starting point to have a LLM evaluation system that can automatically evaluate LLM application automatically
Great learning at weights and biases. It was a great learning opportunity and at the time when AI is at its peak its really helpful to know more about it.
Course instructors

Ayush Thakur

Ayush Thakur is an AI Engineer at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 2 years he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.
AI Engineer Weights & Biases

Anish Shah

Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!
AI Engineer Weights & Biases

Paige Bailey

Paige Bailey is the engineering lead for GenAI Developer Experience at Google. Paige has a deep understanding of the generative AI landscape, having previously served as an applied machine learning engineer at Microsoft and GitHub, and a product lead for Google's PaLM v2 and Gemini models. Paige is passionate about making cutting-edge AI technology accessible, and empowering developers to build the next generation of innovative applications.
GenAI Developer Experience Google

Graham Neubig

Graham Neubig is an Associate Professor at Carnegie Mellon University, and Chief Scientist at All Hands AI. His research work focuses on AI agents for web browsing and code generation, as well as improvements to LLMs for multilingual and multimodal applications. He is a big proponent of open source and open science, including the OpenHands framework for software engineering agents, developed by All Hands AI.
Chief Scientist All Hands AI Associate Professor Carnegie Mellon University
Explore our other courses

Ready to get into another course?

If you are ready to dive into an LLM course, check out our latest RAG++ course in colaboration with Cohere and Weaviate.

Practical RAG techniques for engineers: learn production-ready solutions from industry experts to optimize performance, cut costs, and enhance the accuracy and relevance of your applications.