W&B Demo
Created on August 16|Last edited on August 16
Comment
Weights and Biases (W&B) 💫Logging to the W&B System of RecordSetting the Table 🍽 Title 2Experiment TrackingProfiling CodeArtifact Tracking and VersioningInteractive TablesHyperparameter SweepsTensorboard + W&BTensorboard + W&B Sweeps!SageMaker + W&B SweepsReportsGoing Beyond the Core W&B Primitives
Weights and Biases (W&B) 💫
Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.
Logging to the W&B System of Record
Setting the Table 🍽
vibrant-sweep-5 was a run completed, which tracked training a Neural Network on MNIST done byCelestial-sweep-4 was a run completed by
Run set 2
34
Run set 2
This set of panels contains runs from a private project, which cannot be shown in this report
Title 2
model-1mqp861w
Artifact overview
Type
model
Created At
July 19th, 2022
Description
Versions
Version
Aliases
Logged By
Tags
Created
TTL Remaining
# of Consuming Runs
Size
m.score
m.ModelCheckpoint
m.original_filename
1
best
latest
v1
Tue Jul 19 2022
Inactive
0
1.4MB
mode
max
monitor
val_accuracy
save_top_k
save_weights_only
false
_every_n_train_steps
epoch=1-step=1720.ckpt
0
v0
Tue Jul 19 2022
Inactive
0
1.4MB
mode
max
monitor
val_accuracy
save_top_k
save_weights_only
false
_every_n_train_steps
epoch=0-step=860.ckpt
Loading...
Experiment Tracking
W&B has a few core primitives which comprise the experiment tracking logging system of the SDK. You can log pretty much anything with W&B: scalar metrics, images, video, custom plots, etc.
To get an idea of the variety of data types you can log, check out the below report, which has code snippets for different media types that may pertain to your use case. d
The canonical sections of code which require logging are the training loop and an evaluation job on validation or golden dataset, but you can log any piece of code in your workflow such as data pre-processing, augmentation, generation, etc. All you have to do is call wandb.init() and log diagnostic charts, metrics, and mixed media with wandb.log(). An executed piece of code contextualized by wandb.init() is called a run.
You can also embed rich media and plots into W&B Tables, which provide a persistent, interactive evaluation store for your models. More on them below 👇
### Generic Training Loopwith wandb.init(project="my_project", entity="my_team", job_type="training") as run:for i in range(epochs):optimizer.step()wandb.log({"train_loss": train_loss,"val_loss": val_loss})### Evaluation Script to Assess Model Errorswith wandb.init(project="my_project", entity="my_team", job_type="evaluation") as run:val_preds = self.model.predict(val_data)# log validation predictions alongside the runcolumns=["id", "image", "guess", "truth"]predictions_table = wandb.Table(columns = columns)# log image, predicted and actual labels, and all scores to an interactive tablefor filepath, img, top_guess, scores, truth in zip(self.generator.filenames,val_data,max_preds,val_preds,true_ids):row = [img_id, wandb.Image(img), top_guess, truth, scores]predictions_table.add_data(row)wandb.log({"evaluation_table" : predictions_table})
Here's a simple example using W&B to log some scalar metrics:
This set of panels contains runs from a private project, which cannot be shown in this report
Profiling Code
W&B supports rendering PyTorch traces using the Chrome Trace Viewer. There is an excellent W&B report available if you would like to dive deeper on the topic,
The setup can be particularly simple if you are already using PyTorch Lightning for your model development.
wandb_logger = WandbLogger(project='MNIST', log_model='all', save_code=True, ) # log all new checkpoints during trainingtraining_loader = DataLoader(training_set, batch_size=64, shuffle=True, pin_memory=True)validation_loader = DataLoader(validation_set, batch_size=64, pin_memory=True)## Using a raw DataLoader, rather than LightningDataModule, for greater transparency# Set up modelmodel = MNIST_LitModule(n_layer_1=128, n_layer_2=128)trainer = Trainer(gpus=None, max_epochs=5, profiler="pytorch",logger=wandb_logger,callbacks=[log_predictions_callback,checkpoint_callback],precision=32)trainer.profiler.dirpath="/content/wandb/latest-run/tbprofile"trainer.fit(model, training_loader, validation_loader)# trace_files = glob.glob("/content/lightning_logs/*.pt.trace.json")trace_files = glob.glob("/content/wandb/latest-run/tbprofile/*.pt.trace.json")for i, trace_file in enumerate(trace_files):if "training_step" in trace_file:profile_art = wandb.Artifact(f"train-trace{i}-{wandb.run.id}", type="profile")profile_art.add_file(trace_file, "train_trace.pt.trace.json")else:profile_art = wandb.Artifact(f"validation-trace{i}-{wandb.run.id}", type="profile")profile_art.add_file(trace_file, "validation_trace.pt.trace.json")wandb.log_artifact(profile_art)wandb.finish()
Which can be used to render the trace in the UI, and from there, you can share via the UI itself, or incorporate the trace into your reports as needed.
Run set
0
Artifact Tracking and Versioning
Artifacts are inputs and outputs of each part of your machine learning pipeline, namely datasets and models. Training datasets change over time as new data is collected, removed, or re-labeled, models change with new architectures being implemented along with continuous re-retraining. With these changes, all downstream tasks utilizing the changed datasets and models will be affected and understanding this dependency chain is critical for debugging effectively. W&B can log this dependency graph easily with a few lines of code.
Let's say we have a directory "sample_images" which stores a set of images and labels in our local development environment
import wandbwith wandb.init(project="my_project", job_type="model_training") as run:# Create Artifacttraining_images = wandb.Artifact(name='training_images', type="training_data")# Add serialized datatraining_images.add_dir('sample_images')# Log to W&B, automatic versioningwandb.log_artifact(training_images)
The "sample_images" directory more often exists in some cloud object store like s3 or remote file system, in which case W&B can track references to the respective artifacts. In this case, W&B will still automatically version and provide durable URI's and user-defined aliases to the underlying artifacts.
# Log by reference if data sits in cloud object stores or remote files systemswith wandb.init(project="my_project", job_type="model_training") as run:# Create Artifacttraining_images = wandb.Artifact(name='training_images_reference', type="training_data")# Add reference to data, store only the metadata associated with artifacttraining_images.add_reference(uri='file:///content/sample_images')# Log to W&B, automatic versioningwandb.log_artifact(training_images)
Once an artifact is logged, other users can download and inspect the artifact using the Python sdk or W&B CLI.
import wandbrun = wandb.init()artifact = run.use_artifact('kenlee/my_project/training_images_reference:v0', type='training_data')artifact_dir = artifact.download()
With W&B Artifacts we obtain a complete picture of a machine learning pipeline so we can better understand how and where issues arise and isolate the problem area.

Interactive Tables
W&B Tables enable a granular analysis of predictions and results through tabular data manipulation. Oftentimes, understanding a model's behavior during or after training requires more than seeing a clean loss curve go down and to the right. We need to understand where specifically the model fails, what examples are giving it trouble, where we might need to collect more training data/re-label, or maybe even uncover more nuanced errors like numerical instability.
Tables can be used as a model evaluation store, which stores consolidated results on golden validation datasets across different trained models in your project. They can also be used as model leaderboards, where each row is a model class or architecture with embedded explainability or custom performance charts alongside them. These are both best practices which you can start incorporating with a few lines of code.
Run set
125
See the report below for more details.
Hyperparameter Sweeps
One of the more tedious aspects of training deep learning models is tuning hyper-parameters. When we log runs in W&B, we can make W&B aware of hyper-parameters. A central sweep controller can then delegate new hyper-parameter combinations based on a set of distributions we specify across the hyper-parameter space through a .yaml file. If we do a Bayes search, W&B can even seed the search with previous runs we've already logged. Below is an example where we have a simple training function which exposes several hyper-parameters to W&B via wandb.config. W&B sweep then initializes a hyper-parameter search using the dictionary of distributions for the hyper-parameter space.
def train(config=None):# Initialize a new wandb runwith wandb.init(config=config):# If called by wandb.agent, as below,# this config will be set by Sweep Controllerconfig = wandb.configloader = build_dataset(config.batch_size)network = build_network(config.fc_layer_size, config.dropout)optimizer = build_optimizer(network, config.optimizer, config.learning_rate)for epoch in range(config.epochs):avg_loss = train_epoch(network, loader, optimizer)wandb.log({"loss": avg_loss, "epoch": epoch})sweep_config = {'method': 'random',# 'method': 'grid',# 'method': 'bayes',}parameters_dict = {'optimizer': {'values': ['adam', 'sgd']},'fc_layer_size': {'values': [128, 256, 512, 1024]},'dropout': {'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]},# Static hyperparameter, notice singular key'epochs': {'value': 10},'learning_rate': {# Flat distribution between 0 and 0.1'distribution': 'uniform','min': 0,'max': 0.25},'batch_size': {'distribution': 'q_log_uniform','q': 1,'min': math.log(32),'max': math.log(256),}}### Initializes the central sweep serversweep_config['parameters'] = parameters_dictsweep_id = wandb.sweep(sweep_config, project="sweeps-demo-pytorch")### Run this in multiple machines/cores to distribute the hyperparameter searchwandb.agent(sweep_id, train, count=10)

W&B will automatically separate the runs associated with a sweep and create some charts automatically that allow us to do more meta-analysis on which combinations are working well.
Sweep: 0ewut602 1
10
Sweep: 0ewut602 2
0
Tensorboard + W&B
Simple! When intializing an experiment, add the sync_tensorboard = True argument.

Tensorboard + W&B Sweeps!
There are some very simple ways to explore sweeps without users having to refactor your entire code base. Users that utilize Tensorboard the track a training experiment can take advantage of W&B ability to sync_tensorboard to W&B.
import torchimport wandbfrom torch.utils.tensorboard import SummaryWritersweep_config = {'method': 'grid','metric': {'name': 'Loss/train', ## matches what i write via SummaryWriter'goal': 'minimize'},'early_terminate':{'type': 'hyperband','min_iter': 5},'parameters': {'learning_rate':{'values': [0.01, 0.005, 0.001, 0.0005, 0.0001]}}}x = torch.arange(-5, 5, 0.1).view(-1, 1)y = -5 * x + 0.1 * torch.randn(x.size())wandb.login()wandb.tensorboard.patch(root_logdir = "/content/runs"). ## since i don't have tensorboard running, i set the root logging directory manuallydef sweep_train(config_defaults = None):config_defaults = { "learning_rate": 0.01 } ## will be overwritten during sweepsrun = wandb.init(config=config_defaults, sync_tensorboard=True) # the config gets over-written in the Sweepmodel = torch.nn.Linear(1, 1)writer = SummaryWriter(log_dir = "/content/runs")criterion = torch.nn.MSELoss()optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate)for epoch in range(5):y1 = model(x)loss = criterion(y1, y)writer.add_scalar("Loss/train", loss, epoch)optimizer.zero_grad()loss.backward()optimizer.step()writer.flush()writer.close()run.finish()sweep_id = wandb.sweep(sweep_config, project="sweeps-pytorch-tensorboard-v4")wandb.agent(sweep_id, function=sweep_train)
SageMaker + W&B Sweeps
Certinaly the example above is useful, but probably not how you would want to execute HPO. You can leverage W&B as a controller to execute HPO on Sagemaker! See the blog and code
SageMaker + W&B SweepsIf you can imagine, the sweep_train function is your original training method, with two additional lines of code you can start a W&B sweeps run! One of those lines initiates a run (syncing to tensorboard) and the other finishes out the run.
SageMaker will spin up an AWS instance for each hyperparameter value and train the model. W&B tracks everything that happens and makes it easy to visualize the sweep. Below we are capturing information across a hyperparameter search involving 100 W&B runs. We can visualize and focus our attention on parts of the tuning that were more performant than others and have simple means to compare all the runs. The test accuracy ranges from 10 - 76.45%, depending on the hyperparameters.
Run set
100
Reports
W&B reports help contextualize and document the system of record built through logging diagnostics and results from different pieces of your pipeline. Reports are interactive and dynamic, reflecting filtered run sets logged in W&B. You can add all sorts of assets to a report; the one you are reading now includes plots, tables, images, code, and nested reports.
Whether you are writing technical summaries, regulatory documentation, or just want a real-time dashboard reflecting the progress of your team, reports can be a best practice documentation layer for your data science and machine learning projects. Check out the below gallery for some interesting ideas:
How To Fine-Tune Hugging Face Transformers on a Custom Dataset
In this article, we will learn how to easily fine-tune a HuggingFace Transformer on a custom dataset with Weights & Biases.
Cell Discovery Catalog
Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-Time Strategy Games
Train agents to play an RTS game with commodity machines (one GPU, three vCPU, 16GB RAM)
Learning Dexterity End-to-End Using Weights & Biases Reports
In this article, Alex Paino explores how the OpenAI Robotics team uses Weights & Biases Reports to run massive machine learning projects.
AlphaFold-ed Proteins in W&B Tables
Visualize and analyze protein sequences and 3D structures with W&B Tables
Going Beyond the Core W&B Primitives
wandb.log, wandb.Artifact, wandb.Table, and wandb.sweep can take you far in building your machine learning system of record, forming the core of some best practices we see top machine learning research teams employ in their everyday workflows. Beyond these primitives, our team continues to build out integrations with higher level frameworks and tools whereby simply adding a single W&B callback or function argument causes everything to be automatically logged under the hood. Check out our integrations page and double check the docs of your favorite machine learning repo as their might be already be a W&B integration in place! Let us know if you'd like to see W&B integrated in a package or tool we aren't yet logging!
Run set
62
Add a comment