Judgment day hackathon - building LLM judges

September 21- 22
San Francisco, CA

Evaluating LLM outputs accurately is critical to being able to iterate quickly on a LLM system. Human annotations can be slow and expensive and using LLMs instead promises to solve this. However aligning a LLM Judge with human judgements is often hard with many implementation details to consider.

During the hackathon, let’s try to build LLM Judges together and move the field forward a little by: 

  • Productionizing the latest LLM-as-a-judge research
  • Improving on your existing judge
  • Building annotation UIs
  • Designing wireframes for collaborative annotation between humans and AI

This hackathon is for you if you are an AI Engineer who:

  • Runs LLMs in production or are planning to soon
  • Has LLM Judges and found them to be unreliable
  • Wants to learn more about using LLMs as a judge 
  • Are a LLM Judge skeptic

LLM API credits will be provided to those who need them.

$5,000 cash equivalent prizes will be awarded for top 3 overall projects with a bonus category for most on-theme projects.

Rules:

  • New projects only
  • Maximum team size: 4
  • Make friends
  • Prize eligibility:
    • Project is open sourced on GitHub
    • Use W&B Weave where applicable

 

Timing:

Saturday, Sept 21: 10am-10pm

Sunday, Sept 22: 9:30am-5pm

© 2024 Weights & Biases.