Skip to main content

How to Fine-Tune Your OpenAI GPT-3.5 and GPT-4 Models with Weights & Biases

Announcing WandbLogger; Weights & Biases' dedicated OpenAI fine-tuning logger to track fine-tuning metrics, data and checkpoints. This article explores its power.
Created on November 18|Last edited on December 22

Introducing the W&B OpenAI fine-tuning logger

OpenAI's fine-tuning API allows for better performance than few-shot prompting or prompt-engineering—especially when having datasets larger than few hundred samples. With GPT-4 fine-tuning available from OpenAI, the ability to customize GPT-4 on your own internal data opens up a huge world of possibilities.
To provide seamless MLOps support for the users of their fine-tuning API, Weights & Biases is launching our WandbLogger in our python SDK. With a single line of code you can now keep track of your OpenAI fine-tuning runs, version control your fine-tuned models, visualize your train and validation datasets and store your entire configuration. And all this functionality is available through just one line of code:
from wandb.integration.openai.fine_tuning import WandbLogger

# OpenAI fine-tuning code ...

# Once the OpenAI training job has started, run this line:
WandbLogger.sync(fine_tune_job_id='YOUR OPENAI JOB ID')
Using this line of code you can turn on OpenAI fine-tuning logging for:
✅ Training and validation metrics
✅ Training config
✅ Dataset visualization with W&B Tables
✅ Dataset versioning
✅ Model versioning


If you'd like to get started, the Colab associated with this report is a great place to start. Just click the link below!
💡



For the remainder of this report, we'll look through all the features that you'll get out of the box by using WandbLogger.sync. If you're looking for a detailed tutorial on how to fine-tune a GPT 3.5 or GPT 4 model using using openai APIs, consider checking out the two blog posts below:


How to use the OpenAI fine-tuning WandbLogger class

The WandbLogger packs a ton of useful features. When you combine these with our functionality, it's easy to build a robust MLOps pipeline to systematize your fine-tuning.
Import the WandbLogger from the wandb.integration.openai and call WandbLogger.sync to log finished or ongoing OpenAI training runs. You can config sync method using the following arguments:
  • fine_tune_job_id: This is the OpenAI Fine-Tune ID you get when you create your fine-tune job using client.fine_tuning.jobs.create. If this argument is None, all the OpenAI fine-tune jobs that haven't already been synced will be synced to W&B.
  • openai_client: You can pass an initialized OpenAI client to sync, although its not necessary. If no client is provided, one is initialized by the logger itself. In most cases, you can leave this argument to be None.
  • num_fine_tunes: If no ID is provided, then all the uns-ynced fine-tunes will be logged to W&B. This argument allows you to select the number of recent fine-tunes to sync. If num_fine_tunes is 5, it selects the 5 most recent fine-tunes.
  • project: This is the W&B project where your fine-tune metrics, models, data, etc. will be logged. By default, the project name is "OpenAI-Fine-Tune."
  • entity: By default, this is your W&B username. Pass your W&B team name to this argument if you'd like to start logging to your team.
  • overwrite: As mentioned above, synced fine-tunes are not synced again. Use this argument to forcefully re-sync training runs again. By default this is False.
  • wait_for_job_success: Once an OpenAI fine-tuning job is started it usually takes a bit of time. To ensure that your metrics are logged to W&B as soon as the fine-tune job is finished, this setting will check every 60 seconds for the status of the fine-tune job to change to "succeeded". Once the fine-tune job is detected as being successful, the metrics will be synced automatically to W&B. Set to True by default.
  • kwargs_wandb_init: Pass any additional arguments that you want to pass to wandb.init. See the reference documentation for wandb.init here.

Benefits of dataset versioning and visualization

Dataset visualization with W&B Tables

Datasets are also visualized as W&B Tables which allows you to explore, search, and interact with the dataset.
Check out the training samples visualized using W&B Tables below. You can hover your mouse over the text to read the entire prompt as well as filter for specific examples etc.



Dataset versioning with W&B Artifacts

The logger automatically logs the training and validation data (if provided) to W&B as an Artifact. Artifacts are a tool to store and version your datasets—as well as your model versions.
Below is a view of the training file in Artifacts. Here you can see the W&B run that logged this file, when it was logged, what version of the dataset this is, the metadata, and DAG lineage from the training data to the trained model:



Training Metrics: Fine-tuning loss and accuracy

Loss and accuracy curves are the most basic artifacts of any model training. The logger logs all the metrics generated automatically by the fine-tune job. Here are a few example metrics:



Training configuration a.k.a. hyperparameters

The logger captures the hyperparameters and logs them to W&B, ensuring that your training run is fully reproducible in case you need to run it again (or check whether you have already tried a certain set of hyperparameters).
You can see all the hyperparameters of this experiment below:



The fine-tuned model and model versioning

The end result of fine-tuning is the fine-tuned model's ID. If you noticed in the configs above, the fine_tuned_model config name is the ID of fine-tuned model. The logger doesn't only capture the ID, it also captures the metadata and the data-model lineage:
  1. As we can't log the model weights (as OpenAI don't share them) the logger logs a model_metadata Artifact, which can then be versioned. It can also further be linked to a model in the W&B Model Registry and even paired with W&B Launch. A host of W&B product features can be used one the logger creates this Artifact. Here's the model metadata file:


2. The DAG view of an ML workflow show you the connection (or "lineage") between datasets, training jobs and output models. This is a useful tool to find bugs and explain and understand the pipeline. The DAG view for the model_metadata artifact is shown below. Notice how the train and validation files were consumed by the fine-tuned job to create the model_metadata:v0 artifact. This can be further extended with evaluation, CI/CD, and more.



Get Started Now

The WandbLogger logger can keep track of your fine-tuning jobs and give your detailed visibility and reproducibility into how your fine-tuning runs are performing. Try our colab to get started:

Open in colab \rightarrow

We would love feedback on this logger, if you find any issues or bugs while using the WandbLogger or find it useful please do open an issue in our wandb GitHub repository, we'd love to hear from you!
Iterate on AI agents and models faster. Try Weights & Biases today.