Wandbot Evaluations: SVG Chart Version
Simple SVG chart version for W&B compatibility
Created on June 11|Last edited on June 11
Comment
Wandbot Evaluation Performance Analysis (SVG Chart)
Summary
Analysis of the last 5 wandbot evaluations showing accuracy and cost metrics.
Key Findings
- **Best Performance**: Jun 10 v1.3.3 (91.0% accuracy, $6.03)
- **Cost Range**: $0.38 - $7.61 per evaluation
- **Poor Performers**: May 20 trials (9-11% accuracy)
- **Trend**: Recent versions show improved accuracy
Raw Data
| Date | Version | Accuracy | Cost |
|------|---------|----------|------|
| Jun 10 | v1.3.3 | 91.0% | $6.03 |
| May 20 | Trial 5 | 9.0% | $1.88 |
| May 20 | Trial 1 | 11.2% | $0.38 |
| May 19 | v1.3.2 PROD | 90.4% | $6.02 |
| Apr 17 | o4-mini | 85.3% | $7.61 |
Add a comment