Skip to main content

Last 5 Evaluations: Accuracy & Cost Analysis

Performance overview of the most recent wandbot evaluations
Created on June 10|Last edited on June 10

Last 5 Wandbot Evaluations: Accuracy & Cost Analysis

Overview

This report analyzes the performance and cost metrics of the 5 most recent wandbot evaluations.

Key Findings

**Best Performing Models:** - **v1.3.3 test** (June 2025): 91.02% accuracy, $6.03 cost - **v1.3.2 PROD** (May 2025): 90.41% accuracy, $6.02 cost **Cost Efficiency:** - Production models maintain consistent costs around $6 - Intercom trials showed significantly lower costs but poor accuracy **Accuracy Trends:** - Main production versions consistently achieve >90% accuracy - Intercom evaluations performed poorly (~9-11% accuracy) - o4-mini evaluation achieved 85.31% accuracy at higher cost

Performance Summary

| Evaluation | Date | Accuracy | Cost | |------------|------|----------|------| | v1.3.3 test | 2025-06-10 | 91.02% | $6.03 | | v1.3.2 PROD | 2025-05-19 | 90.41% | $6.02 | | intercom-5 | 2025-05-20 | 8.98% | $1.88 | | intercom-1 | 2025-05-20 | 11.22% | $0.37 | | o4-mini eval | 2025-04-17 | 85.31% | $7.61 |