Last 5 Evaluations Analysis
Analysis of accuracy vs cost for the most recent wandbot evaluations
Created on June 10|Last edited on June 10
Comment
Last 5 Evaluations Analysis
Summary
Chart showing accuracy vs cost trade-off for the 5 most recent wandbot evaluations.
Key Findings
- **Best performing**: wandbot_v1.3.3_test-v55-index with 91.02% accuracy at $6.03
- **Most cost-effective**: intercom_eval_answers-1_trial at $0.38 (but only 11.22% accuracy)
- **Average accuracy**: 49.4%
- **Average cost**: $4.38
Data
| Evaluation | Date | Accuracy | Cost |
|------------|------|----------|------|
| wandbot_v1.3.3_test-v55-index | 2025-06-10 | 91.02% | $6.03 |
| v1.3.2 PROD | 2025-05-19 | 90.41% | $6.02 |
| wandbot_v1-3-2_o4-mini | 2025-04-17 | 85.31% | $7.61 |
| intercom_eval_answers-1_trial | 2025-05-20 | 11.22% | $0.38 |
| intercom_eval_answers-5_trial | 2025-05-20 | 8.98% | $1.88 |
Observations
The main production models (v1.3.3 and v1.3.2) show consistent high performance around 90%+ accuracy with costs around $6. The intercom eval trials show much lower accuracy, suggesting they may be testing different configurations or datasets.
Add a comment