Skip to main content

Last 5 Evaluations: Accuracy vs Cost Analysis

Quick analysis of the last 5 wandbot evaluations showing accuracy and cost trends
Created on June 10|Last edited on June 10

Wandbot Evaluation Results - Last 5 Runs

Summary

Analysis of the most recent 5 evaluations from the wandbot-eval project, tracking accuracy performance and associated costs.

Key Findings

- **Best Performance**: v1.3.3 test (June 10) achieved 91.02% accuracy at $6.03 cost - **Production Baseline**: v1.3.2 PROD (May 19) showed 90.41% accuracy at $6.02 cost - **Cost Efficiency**: Two intercom trials had very low costs ($0.37, $1.88) but poor accuracy (11.22%, 8.98%) - **Stable Performance**: Main production versions consistently achieve ~90% accuracy around $6 cost

Recommendations

- Continue with v1.3.3 as it shows slight improvement over v1.3.2 - Investigate intercom trial configurations - low cost but unacceptable accuracy - Monitor cost trends as they appear stable around $6 per evaluation