Last 5 Evaluations: Accuracy vs Cost Analysis

Quick analysis of the last 5 wandbot evaluations showing accuracy and cost trends

Created on June 10|Last edited on June 10

Comment

﻿
Wandbot Evaluation Results - Last 5 Runs
SummaryAnalysis of the most recent 5 evaluations from the wandbot-eval project, tracking accuracy performance and associated costs.
Key Findings- **Best Performance**: v1.3.3 test (June 10) achieved 91.02% accuracy at $6.03 cost
- **Production Baseline**: v1.3.2 PROD (May 19) showed 90.41% accuracy at $6.02 cost  
- **Cost Efficiency**: Two intercom trials had very low costs ($0.37, $1.88) but poor accuracy (11.22%, 8.98%)
- **Stable Performance**: Main production versions consistently achieve ~90% accuracy around $6 cost
Recommendations- Continue with v1.3.3 as it shows slight improvement over v1.3.2
- Investigate intercom trial configurations - low cost but unacceptable accuracy
- Monitor cost trends as they appear stable around $6 per evaluation
﻿

Add a comment