Skip to main content

Accuracy Scorer

Created on December 2|Last edited on December 3
We also support the good old accuracy scorer that can handle three well known tasks -- binary accuracy, multi-class accuracy and multi-label accuracy.

Definition

Accuracy is simply given by:
Accuracy=Number of Correct PredictionsTotal Number of Predictions \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}

AccuracyScorer


Try out the colab notebook to see how we can use this scorer for different tasks.
from weave.scorers import AccuracyScorer

accuracy_scorer = AccuracyScorer(task="binary")

eval = weave.Evaluation(
dataset=...
scorers=[accuracy_scorer]
)

# evaluate your model

Binary Accuracy

Here's a comparison of three LLM systems, using gpt-3.5-turbo, got-4o-mini and gpt-4o on the IMDB sentiment analysis dataset.
Figure 1: Comparison of three LLM systems using the AccuracyScorer. >>>Click here for interactivity<<<