OpenAI Model Usage Analysis
Analysis of OpenAI model usage over time in the wandb-applied-ai-team/mcp-tests Weave project
Created on March 15|Last edited on March 15
Comment
# OpenAI Model Usage Analysis
## Overview
This report analyzes the usage patterns of OpenAI models in the wandb-applied-ai-team/mcp-tests Weave project. The data shows how different models are used over time for different purposes in the Wandbot technical support system.
## Model Distribution
The analysis reveals two primary models used in the system: - **GPT-4 Turbo (1106)**: Used primarily for evaluation tasks (~70% of calls) - **GPT-4o**: Used mainly for generating responses (~30% of calls) This pattern demonstrates a dual-model architecture where one model generates responses and another evaluates them for quality assurance.
## Time-Based Analysis
Based on our analysis of 50 OpenAI chat traces, we observed the following pattern across a 13-minute window (14:07-14:19 UTC): - Both models are used consistently throughout the time period - GPT-4 Turbo (1106) is used more frequently, representing approximately 70% of all API calls - GPT-4o is used for approximately 30% of all API calls - The pattern suggests a system where GPT-4o generates responses and GPT-4 Turbo evaluates them
## Conclusions
This multi-model approach demonstrates an effective architecture where one model (GPT-4o) generates responses, and another model (GPT-4 Turbo) serves as a quality control mechanism to evaluate those responses before they reach users. This pattern may be useful for other applications that require both generation and evaluation components.
Add a comment