Machine Learning Experiment Tracking with Weights & Biases

The system of record for your model training

Track, compare, and visualize your ML models with 5 lines of code

Quickly and easily implement experiment logging by adding just a few lines to your script and start logging results. Our lightweight integration works with any Python script.

    
     import wandb

# 1. Start a W&B run
run = wandb.init(project="my_first_project")
# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
# 3. Log metrics to visualize performance over time
for i in range(10):
 run.log({"loss": loss})

    
     import wandb
import os

# 1. Set environment variables for the W&B project and tracing.
os.environ["LANGCHAIN_WANDB_TRACING"] = "true" os.environ["WANDB_PROJECT"] = "langchain-tracing"

# 2. Load llms, tools, and agents/chains

llm = OpenAI(temperature=0)
tools = load_tools(["llm-math"], llm=llm)
agent = initialize_agent(
     tools, llm,      agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,      verbose=True
)

# 3. Serve the chain/agent with all underlying complex llm interactions automatically traced and tracked

agent.run("What is 2 raised to .123243 power?")

    
     import wandb
from llama_index import ServiceContext
from llama_index.callbacks import CallbackManager,      WandbCallbackHandler

# initialise WandbCallbackHandler and pass any wandb.init args

wandb_args = {"project":"llamaindex"}
wandb_callback =      WandbCallbackHandler(run_args=wandb_args)

# pass wandb_callback to the service context

callback_manager = CallbackManager([wandb_callback])
service_context =      ServiceContext.from_defaults(callback_manager=
     callback_manager)

    
     import wandb
# 1. Start a new run
run = wandb.init(project="gpt5")
# 2. Save model inputs and hyperparameters
config = run.config
config.dropout = 0.01
# 3. Log gradients and model parameters
run.watch(model)
for batch_idx, (data, target) in enumerate(train_loader):   
...
   if batch_idx % args.log_interval == 0:  
   # 4. Log metrics to visualize performance
      run.log({"loss": loss})

    
     import wandb
‍
# 1. Define which wandb project to log to and name your run
run = wandb.init(project="gpt-5",
run_name="gpt-5-base-high-lr")
‍
# 2. Add wandb in your `TrainingArguments`
args = TrainingArguments(..., report_to="wandb")
‍
# 3. W&B logging will begin automatically when your start training your Trainer
trainer = Trainer(..., args=args)
trainer.train()

    
     from lightning.pytorch.loggers import WandbLogger

# initialise the logger
wandb_logger = WandbLogger(project="llama-4-fine-tune")

# add configs such as batch size etc to the wandb config
wandb_logger.experiment.config["batch_size"] = batch_size

# pass wandb_logger to the Trainer 
trainer = Trainer(..., logger=wandb_logger)

# train the model
trainer.fit(...)

    
     import wandb
# 1. Start a new run
run = wandb.init(project="gpt4")
‍
# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
‍
# Model training here
# 3. Log metrics to visualize performance over time
‍
with tf.Session() as sess:
# ...
wandb.tensorflow.log(tf.summary.merge_all())

    
     import wandb
from wandb.keras import (
   WandbMetricsLogger,
   WandbModelCheckpoint,
)
‍
# 1. Start a new run
run = wandb.init(project="gpt-4")
‍
# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
...  # Define a model
# 3. Log layer dimensions and metrics
wandb_callbacks = [
   WandbMetricsLogger(log_freq=5),
   WandbModelCheckpoint("models"),
]
model.fit(
   X_train, y_train, validation_data=(X_test, y_test),
   callbacks=wandb_callbacks,
)

    
     import wandb
wandb.init(project="visualize-sklearn")
‍
# Model training here
# Log classifier visualizations
wandb.sklearn.plot_classifier(clf, X_train, X_test, y_train, y_test, y_pred, y_probas, labels,
model_name="SVC", feature_names=None)
‍
# Log regression visualizations
wandb.sklearn.plot_regressor(reg, X_train, X_test, y_train, y_test,  model_name="Ridge")
‍
# Log clustering visualizations
wandb.sklearn.plot_clusterer(kmeans, X_train, cluster_labels, labels=None, model_name="KMeans")

    
     import wandb
from wandb.xgboost import wandb_callback
‍
# 1. Start a new run
run = wandb.init(project="visualize-models")
‍
# 2. Add the callback
bst = xgboost.train(param, xg_train, num_round, watchlist, callbacks=[wandb_callback()])
‍
# Get predictions
pred = bst.predict(xg_test)

Visualize and compare every experiment

See model metrics stream live into interactive graphs and tables. It is easy to see how your latest ML model is performing compared to previous experiments, no matter where you are training your models.

Quickly find and re-run previous model checkpoints

Weights & Biases’ experiment tracking saves everything you need to reproduce models later— the latest git commit, hyperparameters, model weights, and even sample test predictions. You can save experiment files and datasets directly to Weights & Biases or store pointers to your own storage.

import wandb

from transformers import DebertaV2ForQuestionAnswering

# 1. Create a wandb run

run = wandb.init(project=’turkish-qa’)

# 2. Connect to the model checkpoint you want on W&B

wandb_model = run.use_artifact(‘sally/turkish-qa/

deberta-v2:v5′)

# 3. Download the model files to a directory

model_dir = wandb_model.download()

# 4. Load your model

model = DebertaV2ForQuestionAnswering.from_pretrained(model_dir)

From “When Inception-ResNet-V2 is too slow” by Stacey Svetlichnaya

Monitor your CPU and GPU usage

Visualize live metrics like GPU utilization to identify training bottlenecks and avoid wasting expensive resources.

Debug performance in real time

See how your model is performing and identify problem areas during training. We support rich media including images, video, audio, and 3D objects.

COVID-19 main protease in complex N3 (left) and COVID-19 main protease in complex with Z31792168 (right) from “Visualizing Molecular Structure with Weights & Biases” by Nicholas Bardy

Dataset versioning with deduplication 100GB free storage

Automatically version logged datasets, with diffing and deduplication handled by Weights & Biases, behind the scenes.

MLOps Whitepaper

Read how building the right technical stack for your machine learning team supports core business efforts and safeguards IP

Accessible anywhere

Check the latest training model and results on desktop and mobile. Use collaborative hosted projects to coordinate across your team.

The Weights & Biases platform helps you streamline your workflow from end to end

Models

Experiments

Track and visualize your ML experiments

Sweeps

Optimize your hyperparameters

Registry

Publish and share your ML models and datasets

Automations

Trigger workflows automatically

Launch

Package and run your ML workflow jobs

Weave

Traces

Explore and
debug LLMs

Evaluations

Rigorous evaluations of GenAI applications

Core

Artifacts

Version and manage your ML pipelines

Tables

Visualize and explore your ML data

Reports

Document and share your ML insights

The Science of Debugging with W&B Reports

By Sarah Jane of Latent Space