Last 5 Evaluations: Accuracy & Cost Analysis
Performance overview of the most recent wandbot evaluations
Created on June 10|Last edited on June 10
Comment
Last 5 Wandbot Evaluations: Accuracy & Cost Analysis
Overview
This report analyzes the performance and cost metrics of the 5 most recent wandbot evaluations.
Key Findings
**Best Performing Models:**
- **v1.3.3 test** (June 2025): 91.02% accuracy, $6.03 cost
- **v1.3.2 PROD** (May 2025): 90.41% accuracy, $6.02 cost
**Cost Efficiency:**
- Production models maintain consistent costs around $6
- Intercom trials showed significantly lower costs but poor accuracy
**Accuracy Trends:**
- Main production versions consistently achieve >90% accuracy
- Intercom evaluations performed poorly (~9-11% accuracy)
- o4-mini evaluation achieved 85.31% accuracy at higher cost
Performance Summary
| Evaluation | Date | Accuracy | Cost |
|------------|------|----------|------|
| v1.3.3 test | 2025-06-10 | 91.02% | $6.03 |
| v1.3.2 PROD | 2025-05-19 | 90.41% | $6.02 |
| intercom-5 | 2025-05-20 | 8.98% | $1.88 |
| intercom-1 | 2025-05-20 | 11.22% | $0.37 |
| o4-mini eval | 2025-04-17 | 85.31% | $7.61 |
Add a comment