Skip to main content

Last 5 Evaluations: Accuracy & Cost Analysis

Analysis of the most recent 5 wandbot evaluations showing accuracy scores and associated costs
Created on June 11|Last edited on June 11

Last 5 Wandbot Evaluations Analysis

Summary

This report analyzes the performance and cost of the 5 most recent wandbot evaluations.

Key Findings

- **Best Overall Performance**: v1.3.3 (June 10) with 2.88 accuracy score at $6.03 - **Production Stability**: v1.3.2 PROD maintained high accuracy (2.87) with reasonable cost ($6.02) - **Cost Efficiency**: Trial evaluations showed significantly lower costs but reduced accuracy - **Model Evolution**: The o4-mini version had highest cost ($7.61) but lower accuracy than current versions

Recommendations

- v1.3.3 shows the best balance of accuracy and cost efficiency - Continue monitoring cost vs performance trade-offs in future evaluations

Run set
286