Last 5 Evaluations: Accuracy vs Cost Analysis
Quick analysis of the last 5 wandbot evaluations showing accuracy and cost trends
Created on June 10|Last edited on June 10
Comment
Wandbot Evaluation Results - Last 5 Runs
Summary
Analysis of the most recent 5 evaluations from the wandbot-eval project, tracking accuracy performance and associated costs.
Key Findings
- **Best Performance**: v1.3.3 test (June 10) achieved 91.02% accuracy at $6.03 cost
- **Production Baseline**: v1.3.2 PROD (May 19) showed 90.41% accuracy at $6.02 cost
- **Cost Efficiency**: Two intercom trials had very low costs ($0.37, $1.88) but poor accuracy (11.22%, 8.98%)
- **Stable Performance**: Main production versions consistently achieve ~90% accuracy around $6 cost
Recommendations
- Continue with v1.3.3 as it shows slight improvement over v1.3.2
- Investigate intercom trial configurations - low cost but unacceptable accuracy
- Monitor cost trends as they appear stable around $6 per evaluation
Add a comment