Accuracy Scorer
Created on December 2|Last edited on December 3
Comment
We also support the good old accuracy scorer that can handle three well known tasks -- binary accuracy, multi-class accuracy and multi-label accuracy.
Definition
Accuracy is simply given by:
AccuracyScorer
Try out the colab notebook to see how we can use this scorer for different tasks.
from weave.scorers import AccuracyScoreraccuracy_scorer = AccuracyScorer(task="binary")eval = weave.Evaluation(dataset=...scorers=[accuracy_scorer])# evaluate your model
Binary Accuracy
Here's a comparison of three LLM systems, using gpt-3.5-turbo, got-4o-mini and gpt-4o on the IMDB sentiment analysis dataset.

Figure 1: Comparison of three LLM systems using the AccuracyScorer. >>>Click here for interactivity<<<
Add a comment