Skip to main content

Wandbot Evaluations: Accuracy vs Cost (Fixed)

Fixed version with W&B compatible HTML chart
Created on June 11|Last edited on June 11

Wandbot Evaluation Performance Analysis (Fixed)

Summary

Analysis of the last 5 wandbot evaluations showing accuracy and cost metrics.

Key Findings

- **Best Performance**: Jun 10 v1.3.3 and May 19 v1.3.2 PROD both achieved ~90% accuracy - **Cost Range**: $0.38 - $7.61 per evaluation - **Poor Performers**: May 20 trials showed very low accuracy (9-11%) - **Cost vs Accuracy**: Higher performing models generally cost more (~$6)

Evaluation Details

1. **Jun 10 v1.3.3**: 91.0% accuracy, $6.03 cost 2. **May 20 Trial 5**: 9.0% accuracy, $1.88 cost 3. **May 20 Trial 1**: 11.2% accuracy, $0.38 cost 4. **May 19 v1.3.2 PROD**: 90.4% accuracy, $6.02 cost 5. **Apr 17 o4-mini**: 85.3% accuracy, $7.61 cost