Skip to main content

Getting Started with Weights and Biases

Created on June 20|Last edited on June 27


Weights and Biases (W&B) 💫

Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.

Logging to the W&B System of Record

Experiment Tracking

W&B has a few core primitives which comprise the experiment tracking logging system of the SDK. You can log pretty much anything with W&B: scalar metrics, images, video, custom plots, etc.
To get an idea of the variety of data types you can log, check out the below report, which has code snippets for different media types that may pertain to your use case.

The canonical sections of code which require logging are the training loop and an evaluation job on validation or golden dataset, but you can log any piece of code in your workflow such as data pre-processing, augmentation, generation, etc. All you have to do is call wandb.init() and log diagnostic charts, metrics, and mixed media with wandb.log(). An executed piece of code contextualized by wandb.init() is called a run.
You can also embed rich media and plots into W&B Tables, which provide a persistent, interactive evaluation store for your models. More on them below 👇
### Generic Training Loop
with wandb.init(project="my_project", entity="my_team", job_type="training") as run:
for i in range(epochs):
optimizer.step()
wandb.log({"train_loss": train_loss,
"val_loss": val_loss})

### Evaluation Script to Assess Model Errors
with wandb.init(project="my_project", entity="my_team", job_type="evaluation") as run:
val_preds = self.model.predict(val_data)

# log validation predictions alongside the run
columns=["id", "image", "guess", "truth"]
predictions_table = wandb.Table(columns = columns)
# log image, predicted and actual labels, and all scores to an interactive table
for filepath, img, top_guess, scores, truth in zip(self.generator.filenames,
val_data,
max_preds,
val_preds,
true_ids):
row = [img_id, wandb.Image(img), top_guess, truth, scores]
predictions_table.add_data(row)
wandb.log({"evaluation_table" : predictions_table})
Here's a simple example using W&B to log some scalar metrics:

This set of panels contains runs from a private project, which cannot be shown in this report


Artifact Tracking and Versioning

Artifacts are inputs and outputs of each part of your machine learning pipeline, namely datasets and models. Training datasets change over time as new data is collected, removed, or re-labeled, models change with new architectures being implemented along with continuous re-retraining. With these changes, all downstream tasks utilizing the changed datasets and models will be affected and understanding this dependency chain is critical for debugging effectively. W&B can log this dependency graph easily with a few lines of code.
Let's say we have a directory "sample_images" which stores a set of images and labels in our local development environment
import wandb

with wandb.init(project="my_project", job_type="model_training") as run:
# Create Artifact
training_images = wandb.Artifact(name='training_images', type="training_data")
# Add serialized data
training_images.add_dir('sample_images')
# Log to W&B, automatic versioning
wandb.log_artifact(training_images)
The "sample_images" directory more often exists in some cloud object store like s3 or remote file system, in which case W&B can track references to the respective artifacts. In this case, W&B will still automatically version and provide durable URI's and user-defined aliases to the underlying artifacts.
# Log by reference if data sits in cloud object stores or remote files systems
with wandb.init(project="my_project", job_type="model_training") as run:
# Create Artifact
training_images = wandb.Artifact(name='training_images_reference', type="training_data")
# Add reference to data, store only the metadata associated with artifact
training_images.add_reference(uri='file:///content/sample_images')
# Log to W&B, automatic versioning
wandb.log_artifact(training_images)
Once an artifact is logged, other users can download and inspect the artifact using the Python sdk or W&B CLI.
import wandb
run = wandb.init()
artifact = run.use_artifact('kenlee/my_project/training_images_reference:v0', type='training_data')
artifact_dir = artifact.download()
With W&B Artifacts we obtain a complete picture of a machine learning pipeline so we can better understand how and where issues arise and isolate the problem area.



Interactive Tables

W&B Tables enable a granular analysis of predictions and results through tabular data manipulation. Oftentimes, understanding a model's behavior during or after training requires more than seeing a clean loss curve go down and to the right. We need to understand where specifically the model fails, what examples are giving it trouble, where we might need to collect more training data/re-label, or maybe even uncover more nuanced errors like numerical instability.
Tables can be used as a model evaluation store, which stores consolidated results on golden validation datasets across different trained models in your project. They can also be used as model leaderboards, where each row is a model class or architecture with embedded explainability or custom performance charts alongside them. These are both best practices which you can start incorporating with a few lines of code.



Hyperparameter Sweeps

One of the more tedious aspects of training deep learning models is tuning hyper-parameters. When we log runs in W&B, we can make W&B aware of hyper-parameters. A central sweep controller can then delegate new hyper-parameter combinations based on a set of distributions we specify across the hyper-parameter space through a .yaml file. If we do a Bayes search, W&B can even seed the search with previous runs we've already logged. Below is an example where we have a simple training function which exposes several hyper-parameters to W&B via wandb.config. W&B sweep then initializes a hyper-parameter search using the dictionary of distributions for the hyper-parameter space.
def train(config=None):
# Initialize a new wandb run
with wandb.init(config=config):
# If called by wandb.agent, as below,
# this config will be set by Sweep Controller
config = wandb.config

loader = build_dataset(config.batch_size)
network = build_network(config.fc_layer_size, config.dropout)
optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

for epoch in range(config.epochs):
avg_loss = train_epoch(network, loader, optimizer)
wandb.log({"loss": avg_loss, "epoch": epoch})

sweep_config = {
'method': 'random',
# 'method': 'grid',
# 'method': 'bayes',
}

parameters_dict = {
'optimizer': {
'values': ['adam', 'sgd']
},
'fc_layer_size': {
'values': [128, 256, 512, 1024]
},
'dropout': {
'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
},
# Static hyperparameter, notice singular key
'epochs': {
'value': 10
},
'learning_rate': {
# Flat distribution between 0 and 0.1
'distribution': 'uniform',
'min': 0,
'max': 0.25
},
'batch_size': {
'distribution': 'q_log_uniform',
'q': 1,
'min': math.log(32),
'max': math.log(256),
}
}

### Initializes the central sweep server
sweep_config['parameters'] = parameters_dict
sweep_id = wandb.sweep(sweep_config, project="sweeps-demo-pytorch")

### Run this in multiple machines/cores to distribute the hyperparameter search
wandb.agent(sweep_id, train, count=10)


W&B will automatically separate the runs associated with a sweep and create some charts automatically that allow us to do more meta-analysis on which combinations are working well.

Sweep: 0ewut602 1
10
Sweep: 0ewut602 2
0


Reports

W&B reports help contextualize and document the system of record built through logging diagnostics and results from different pieces of your pipeline. Reports are interactive and dynamic, reflecting filtered run sets logged in W&B. You can add all sorts of assets to a report; the one you are reading now includes plots, tables, images, code, and nested reports.
Whether you are writing technical summaries, regulatory documentation, or just want a real-time dashboard reflecting the progress of your team, reports can be a best practice documentation layer for your data science and machine learning projects. Check out the below gallery for some interesting ideas:


Going Beyond the Core W&B Primitives

wandb.log, wandb.Artifact, wandb.Table, and wandb.sweep can take you far in building your machine learning system of record, forming the core of some best practices we see top machine learning research teams employ in their everyday workflows. Beyond these primitives, our team continues to build out integrations with higher level frameworks and tools whereby simply adding a single W&B callback or function argument causes everything to be automatically logged under the hood. Check out our integrations page and double check the docs of your favorite machine learning repo as their might be already be a W&B integration in place! Let us know if you'd like to see W&B integrated in a package or tool we aren't yet logging!

Summary

W&B can help users and managers better understand the results and progress of each of those work-streams. Together, both platforms enable teams to better collaborate, iterate and move models into production faster and more reliably.


Weights & Biases and Tensorboard