Wandbot Evaluations: Accuracy vs Cost (Fixed)
Fixed version with W&B compatible HTML chart
Created on June 11|Last edited on June 11
Comment
Wandbot Evaluation Performance Analysis (Fixed)
Summary
Analysis of the last 5 wandbot evaluations showing accuracy and cost metrics.
Key Findings
- **Best Performance**: Jun 10 v1.3.3 and May 19 v1.3.2 PROD both achieved ~90% accuracy
- **Cost Range**: $0.38 - $7.61 per evaluation
- **Poor Performers**: May 20 trials showed very low accuracy (9-11%)
- **Cost vs Accuracy**: Higher performing models generally cost more (~$6)
Evaluation Details
1. **Jun 10 v1.3.3**: 91.0% accuracy, $6.03 cost
2. **May 20 Trial 5**: 9.0% accuracy, $1.88 cost
3. **May 20 Trial 1**: 11.2% accuracy, $0.38 cost
4. **May 19 v1.3.2 PROD**: 90.4% accuracy, $6.02 cost
5. **Apr 17 o4-mini**: 85.3% accuracy, $7.61 cost
Add a comment