Skip to main content

OpenAI Model Usage Analysis

Analysis of OpenAI model usage over time in the wandb-applied-ai-team/mcp-tests Weave project
Created on March 15|Last edited on March 15

# OpenAI Model Usage Analysis

## Overview

This report analyzes the usage patterns of OpenAI models in the wandb-applied-ai-team/mcp-tests Weave project. The data shows how different models are used over time for different purposes in the Wandbot technical support system.

## Model Distribution

The analysis reveals two primary models used in the system: - **GPT-4 Turbo (1106)**: Used primarily for evaluation tasks (~70% of calls) - **GPT-4o**: Used mainly for generating responses (~30% of calls) This pattern demonstrates a dual-model architecture where one model generates responses and another evaluates them for quality assurance.

## Time-Based Analysis

Based on our analysis of 50 OpenAI chat traces, we observed the following pattern across a 13-minute window (14:07-14:19 UTC): - Both models are used consistently throughout the time period - GPT-4 Turbo (1106) is used more frequently, representing approximately 70% of all API calls - GPT-4o is used for approximately 30% of all API calls - The pattern suggests a system where GPT-4o generates responses and GPT-4 Turbo evaluates them

## Conclusions

This multi-model approach demonstrates an effective architecture where one model (GPT-4o) generates responses, and another model (GPT-4 Turbo) serves as a quality control mechanism to evaluate those responses before they reach users. This pattern may be useful for other applications that require both generation and evaluation components.