Skip to main content

W&B Demo

Created on August 16|Last edited on August 16


Weights and Biases (W&B) 💫

Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.

Logging to the W&B System of Record

Setting the Table 🍽

Model Comparisons
vibrant-sweep-5 was a run completed, which tracked training a Neural Network on MNIST done byCelestial-sweep-4 was a run completed by


Run set 2
34
Run set 2



This set of panels contains runs from a private project, which cannot be shown in this report


Title 2


model-1mqp861w
Artifact overview
Type
model
Created At
July 19th, 2022
Description
Versions
Version
Aliases
Logged By
Tags
Created
TTL Remaining
# of Consuming Runs
Size
m.score
m.ModelCheckpoint
m.original_filename
1
best
latest
v1
Tue Jul 19 2022
Inactive
0
1.4MB
0.9648
mode
max
monitor
val_accuracy
save_top_k
1
save_weights_only
false
_every_n_train_steps
0
epoch=1-step=1720.ckpt
0
v0
Tue Jul 19 2022
Inactive
0
1.4MB
0.9634
mode
max
monitor
val_accuracy
save_top_k
1
save_weights_only
false
_every_n_train_steps
0
epoch=0-step=860.ckpt
Loading...

Experiment Tracking

W&B has a few core primitives which comprise the experiment tracking logging system of the SDK. You can log pretty much anything with W&B: scalar metrics, images, video, custom plots, etc.
To get an idea of the variety of data types you can log, check out the below report, which has code snippets for different media types that may pertain to your use case. d

The canonical sections of code which require logging are the training loop and an evaluation job on validation or golden dataset, but you can log any piece of code in your workflow such as data pre-processing, augmentation, generation, etc. All you have to do is call wandb.init() and log diagnostic charts, metrics, and mixed media with wandb.log(). An executed piece of code contextualized by wandb.init() is called a run.
You can also embed rich media and plots into W&B Tables, which provide a persistent, interactive evaluation store for your models. More on them below 👇
### Generic Training Loop
with wandb.init(project="my_project", entity="my_team", job_type="training") as run:
for i in range(epochs):
optimizer.step()
wandb.log({"train_loss": train_loss,
"val_loss": val_loss})

### Evaluation Script to Assess Model Errors
with wandb.init(project="my_project", entity="my_team", job_type="evaluation") as run:
val_preds = self.model.predict(val_data)

# log validation predictions alongside the run
columns=["id", "image", "guess", "truth"]
predictions_table = wandb.Table(columns = columns)
# log image, predicted and actual labels, and all scores to an interactive table
for filepath, img, top_guess, scores, truth in zip(self.generator.filenames,
val_data,
max_preds,
val_preds,
true_ids):
row = [img_id, wandb.Image(img), top_guess, truth, scores]
predictions_table.add_data(row)
wandb.log({"evaluation_table" : predictions_table})
Here's a simple example using W&B to log some scalar metrics:

This set of panels contains runs from a private project, which cannot be shown in this report



Profiling Code

W&B supports rendering PyTorch traces using the Chrome Trace Viewer. There is an excellent W&B report available if you would like to dive deeper on the topic,
The setup can be particularly simple if you are already using PyTorch Lightning for your model development.
wandb_logger = WandbLogger(project='MNIST', log_model='all', save_code=True, ) # log all new checkpoints during training

training_loader = DataLoader(training_set, batch_size=64, shuffle=True, pin_memory=True)
validation_loader = DataLoader(validation_set, batch_size=64, pin_memory=True)
## Using a raw DataLoader, rather than LightningDataModule, for greater transparency

# Set up model
model = MNIST_LitModule(n_layer_1=128, n_layer_2=128)

trainer = Trainer(gpus=None, max_epochs=5, profiler="pytorch",logger=wandb_logger,
callbacks=[
log_predictions_callback,
checkpoint_callback
],
precision=32)
trainer.profiler.dirpath="/content/wandb/latest-run/tbprofile"
trainer.fit(model, training_loader, validation_loader)
# trace_files = glob.glob("/content/lightning_logs/*.pt.trace.json")
trace_files = glob.glob("/content/wandb/latest-run/tbprofile/*.pt.trace.json")
for i, trace_file in enumerate(trace_files):
if "training_step" in trace_file:
profile_art = wandb.Artifact(f"train-trace{i}-{wandb.run.id}", type="profile")
profile_art.add_file(trace_file, "train_trace.pt.trace.json")
else:
profile_art = wandb.Artifact(f"validation-trace{i}-{wandb.run.id}", type="profile")
profile_art.add_file(trace_file, "validation_trace.pt.trace.json")
wandb.log_artifact(profile_art)
wandb.finish()
Which can be used to render the trace in the UI, and from there, you can share via the UI itself, or incorporate the trace into your reports as needed.

Run set
0


Artifact Tracking and Versioning

Artifacts are inputs and outputs of each part of your machine learning pipeline, namely datasets and models. Training datasets change over time as new data is collected, removed, or re-labeled, models change with new architectures being implemented along with continuous re-retraining. With these changes, all downstream tasks utilizing the changed datasets and models will be affected and understanding this dependency chain is critical for debugging effectively. W&B can log this dependency graph easily with a few lines of code.
Let's say we have a directory "sample_images" which stores a set of images and labels in our local development environment
import wandb

with wandb.init(project="my_project", job_type="model_training") as run:
# Create Artifact
training_images = wandb.Artifact(name='training_images', type="training_data")
# Add serialized data
training_images.add_dir('sample_images')
# Log to W&B, automatic versioning
wandb.log_artifact(training_images)
The "sample_images" directory more often exists in some cloud object store like s3 or remote file system, in which case W&B can track references to the respective artifacts. In this case, W&B will still automatically version and provide durable URI's and user-defined aliases to the underlying artifacts.
# Log by reference if data sits in cloud object stores or remote files systems
with wandb.init(project="my_project", job_type="model_training") as run:
# Create Artifact
training_images = wandb.Artifact(name='training_images_reference', type="training_data")
# Add reference to data, store only the metadata associated with artifact
training_images.add_reference(uri='file:///content/sample_images')
# Log to W&B, automatic versioning
wandb.log_artifact(training_images)
Once an artifact is logged, other users can download and inspect the artifact using the Python sdk or W&B CLI.
import wandb
run = wandb.init()
artifact = run.use_artifact('kenlee/my_project/training_images_reference:v0', type='training_data')
artifact_dir = artifact.download()
With W&B Artifacts we obtain a complete picture of a machine learning pipeline so we can better understand how and where issues arise and isolate the problem area.



Interactive Tables

W&B Tables enable a granular analysis of predictions and results through tabular data manipulation. Oftentimes, understanding a model's behavior during or after training requires more than seeing a clean loss curve go down and to the right. We need to understand where specifically the model fails, what examples are giving it trouble, where we might need to collect more training data/re-label, or maybe even uncover more nuanced errors like numerical instability.
Tables can be used as a model evaluation store, which stores consolidated results on golden validation datasets across different trained models in your project. They can also be used as model leaderboards, where each row is a model class or architecture with embedded explainability or custom performance charts alongside them. These are both best practices which you can start incorporating with a few lines of code.





Run set
125

See the report below for more details.


Hyperparameter Sweeps

One of the more tedious aspects of training deep learning models is tuning hyper-parameters. When we log runs in W&B, we can make W&B aware of hyper-parameters. A central sweep controller can then delegate new hyper-parameter combinations based on a set of distributions we specify across the hyper-parameter space through a .yaml file. If we do a Bayes search, W&B can even seed the search with previous runs we've already logged. Below is an example where we have a simple training function which exposes several hyper-parameters to W&B via wandb.config. W&B sweep then initializes a hyper-parameter search using the dictionary of distributions for the hyper-parameter space.
def train(config=None):
# Initialize a new wandb run
with wandb.init(config=config):
# If called by wandb.agent, as below,
# this config will be set by Sweep Controller
config = wandb.config

loader = build_dataset(config.batch_size)
network = build_network(config.fc_layer_size, config.dropout)
optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

for epoch in range(config.epochs):
avg_loss = train_epoch(network, loader, optimizer)
wandb.log({"loss": avg_loss, "epoch": epoch})

sweep_config = {
'method': 'random',
# 'method': 'grid',
# 'method': 'bayes',
}

parameters_dict = {
'optimizer': {
'values': ['adam', 'sgd']
},
'fc_layer_size': {
'values': [128, 256, 512, 1024]
},
'dropout': {
'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
},
# Static hyperparameter, notice singular key
'epochs': {
'value': 10
},
'learning_rate': {
# Flat distribution between 0 and 0.1
'distribution': 'uniform',
'min': 0,
'max': 0.25
},
'batch_size': {
'distribution': 'q_log_uniform',
'q': 1,
'min': math.log(32),
'max': math.log(256),
}
}

### Initializes the central sweep server
sweep_config['parameters'] = parameters_dict
sweep_id = wandb.sweep(sweep_config, project="sweeps-demo-pytorch")

### Run this in multiple machines/cores to distribute the hyperparameter search
wandb.agent(sweep_id, train, count=10)


W&B will automatically separate the runs associated with a sweep and create some charts automatically that allow us to do more meta-analysis on which combinations are working well.

Sweep: 0ewut602 1
10
Sweep: 0ewut602 2
0


Tensorboard + W&B

Simple! When intializing an experiment, add the sync_tensorboard = True argument.


Tensorboard + W&B Sweeps!

There are some very simple ways to explore sweeps without users having to refactor your entire code base. Users that utilize Tensorboard the track a training experiment can take advantage of W&B ability to sync_tensorboard to W&B.
import torch
import wandb
from torch.utils.tensorboard import SummaryWriter

sweep_config = {
'method': 'grid',
'metric': {
'name': 'Loss/train', ## matches what i write via SummaryWriter
'goal': 'minimize'
},
'early_terminate':{
'type': 'hyperband',
'min_iter': 5
},
'parameters': {
'learning_rate':{
'values': [0.01, 0.005, 0.001, 0.0005, 0.0001]
}
}
}

x = torch.arange(-5, 5, 0.1).view(-1, 1)
y = -5 * x + 0.1 * torch.randn(x.size())

wandb.login()
wandb.tensorboard.patch(root_logdir = "/content/runs"). ## since i don't have tensorboard running, i set the root logging directory manually

def sweep_train(config_defaults = None):
config_defaults = { "learning_rate": 0.01 } ## will be overwritten during sweeps
run = wandb.init(config=config_defaults, sync_tensorboard=True) # the config gets over-written in the Sweep
model = torch.nn.Linear(1, 1)
writer = SummaryWriter(log_dir = "/content/runs")
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate)
for epoch in range(5):
y1 = model(x)
loss = criterion(y1, y)
writer.add_scalar("Loss/train", loss, epoch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
writer.flush()
writer.close()
run.finish()

sweep_id = wandb.sweep(sweep_config, project="sweeps-pytorch-tensorboard-v4")
wandb.agent(sweep_id, function=sweep_train)

SageMaker + W&B Sweeps

Certinaly the example above is useful, but probably not how you would want to execute HPO. You can leverage W&B as a controller to execute HPO on Sagemaker! See the blog and code
SageMaker + W&B SweepsIf you can imagine, the sweep_train function is your original training method, with two additional lines of code you can start a W&B sweeps run! One of those lines initiates a run (syncing to tensorboard) and the other finishes out the run.
SageMaker will spin up an AWS instance for each hyperparameter value and train the model. W&B tracks everything that happens and makes it easy to visualize the sweep. Below we are capturing information across a hyperparameter search involving 100 W&B runs. We can visualize and focus our attention on parts of the tuning that were more performant than others and have simple means to compare all the runs. The test accuracy ranges from 10 - 76.45%, depending on the hyperparameters.


Run set
100


Reports

W&B reports help contextualize and document the system of record built through logging diagnostics and results from different pieces of your pipeline. Reports are interactive and dynamic, reflecting filtered run sets logged in W&B. You can add all sorts of assets to a report; the one you are reading now includes plots, tables, images, code, and nested reports.
Whether you are writing technical summaries, regulatory documentation, or just want a real-time dashboard reflecting the progress of your team, reports can be a best practice documentation layer for your data science and machine learning projects. Check out the below gallery for some interesting ideas:


Going Beyond the Core W&B Primitives

wandb.log, wandb.Artifact, wandb.Table, and wandb.sweep can take you far in building your machine learning system of record, forming the core of some best practices we see top machine learning research teams employ in their everyday workflows. Beyond these primitives, our team continues to build out integrations with higher level frameworks and tools whereby simply adding a single W&B callback or function argument causes everything to be automatically logged under the hood. Check out our integrations page and double check the docs of your favorite machine learning repo as their might be already be a W&B integration in place! Let us know if you'd like to see W&B integrated in a package or tool we aren't yet logging!



Run set
62


artifact