OpenAI Model Usage Analysis

Analysis of OpenAI model usage over time in the wandb-applied-ai-team/mcp-tests Weave project

Created on March 15|Last edited on March 15

Comment

﻿
# OpenAI Model Usage Analysis
## Overview
This report analyzes the usage patterns of OpenAI models in the wandb-applied-ai-team/mcp-tests Weave project. The data shows how different models are used over time for different purposes in the Wandbot technical support system.
## Model Distribution
The analysis reveals two primary models used in the system:

- **GPT-4 Turbo (1106)**: Used primarily for evaluation tasks (~70% of calls)
- **GPT-4o**: Used mainly for generating responses (~30% of calls)

This pattern demonstrates a dual-model architecture where one model generates responses and another evaluates them for quality assurance.
## Time-Based Analysis
Based on our analysis of 50 OpenAI chat traces, we observed the following pattern across a 13-minute window (14:07-14:19 UTC):

- Both models are used consistently throughout the time period
- GPT-4 Turbo (1106) is used more frequently, representing approximately 70% of all API calls
- GPT-4o is used for approximately 30% of all API calls
- The pattern suggests a system where GPT-4o generates responses and GPT-4 Turbo evaluates them
## Conclusions
This multi-model approach demonstrates an effective architecture where one model (GPT-4o) generates responses, and another model (GPT-4 Turbo) serves as a quality control mechanism to evaluate those responses before they reach users. This pattern may be useful for other applications that require both generation and evaluation components.﻿

Add a comment