Skip to main content

Last 5 Evaluations Analysis

Analysis of accuracy vs cost for the most recent wandbot evaluations
Created on June 10|Last edited on June 10

Last 5 Evaluations Analysis

Summary

Chart showing accuracy vs cost trade-off for the 5 most recent wandbot evaluations.

Key Findings

- **Best performing**: wandbot_v1.3.3_test-v55-index with 91.02% accuracy at $6.03 - **Most cost-effective**: intercom_eval_answers-1_trial at $0.38 (but only 11.22% accuracy) - **Average accuracy**: 49.4% - **Average cost**: $4.38

Data

| Evaluation | Date | Accuracy | Cost | |------------|------|----------|------| | wandbot_v1.3.3_test-v55-index | 2025-06-10 | 91.02% | $6.03 | | v1.3.2 PROD | 2025-05-19 | 90.41% | $6.02 | | wandbot_v1-3-2_o4-mini | 2025-04-17 | 85.31% | $7.61 | | intercom_eval_answers-1_trial | 2025-05-20 | 11.22% | $0.38 | | intercom_eval_answers-5_trial | 2025-05-20 | 8.98% | $1.88 |

Observations

The main production models (v1.3.3 and v1.3.2) show consistent high performance around 90%+ accuracy with costs around $6. The intercom eval trials show much lower accuracy, suggesting they may be testing different configurations or datasets.