Skip to main content

W&B Training and POC Template

Created on July 21|Last edited on July 21


Weights and Biases (W&B) 💫

Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.


Pilot Plan



Success Criteria Example




Use Cases / Test Cases

Use Case

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua

Methodolgy

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua

Framework

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua

Environment

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua

Test Cases

  • Experiment Tracking / Model Comparison and Visuals
  • Tensorboard integration
  • Capture Code Profiling per Experiment
  • Artifact Tracking and Versions (Model provenance)
  • Create project dashboards via Tables
  • Sweeps for Hyperparameter optimization

Logging to the W&B System of Record

Experiment Tracking 🍽

The entry point for W&B usage is Experiments. W&B has a few core primitives which comprise the experiment tracking logging system of the SDK allowing W&B to take logging to a whole new level, allowing you to log pretty much anything with W&B: scalar metrics, images, video, custom plots, etc. Once the logging is completing, we then need to contextualize our experiment, and W&B provides means to accomplish this via Reports.
Consider the example of an ML Engineer joining a team that has been tasked with iterating on an existing model. Given W&B is the source of record for ML Development, the engineer has been able to review others work with W&B, and get to work immediately.
Now, suppose this engineer has developed a promising model. They make being to contextualize the experiments by way of reports, leveraging EVERYTHING that was tracking during the course of their experiment, and moreover, be able to provide comparison at ease with earlier experiments, and share the results at a click!

Run set 2
62

We already saw a glimpse of a generic setup instrumenting W&B tracking within a Python notebook, but be aware, there are great integrations available to make experiment tracking even easier for popular frameworks and libraries!


Tensorboard + W&B

If you are already using Tensorboard, hen intializing an experiment, add the sync_tensorboard = True argument. This will take metrics and detail logged to Tensorboard and log it to W&B automatically. Also, you will be provided an in browser Tensorboard experience.


To get an idea of the variety of data types you can log, check out the below report, which has code snippets for different media types that may pertain to your use case.


Profiling Code

W&B supports rendering PyTorch traces using the Chrome Trace Viewer. There is an excellent W&B report available if you would like to dive deeper on the topic,
The setup can be particularly simple if you are already using PyTorch Lightning for your model development.
wandb_logger = WandbLogger(project='MNIST', log_model='all', save_code=True, ) # log all new checkpoints during training

training_loader = DataLoader(training_set, batch_size=64, shuffle=True, pin_memory=True)
validation_loader = DataLoader(validation_set, batch_size=64, pin_memory=True)
## Using a raw DataLoader, rather than LightningDataModule, for greater transparency

# Set up model
model = MNIST_LitModule(n_layer_1=128, n_layer_2=128)

trainer = Trainer(gpus=None, max_epochs=5, profiler="pytorch",logger=wandb_logger,
callbacks=[
log_predictions_callback,
checkpoint_callback
],
precision=32)
trainer.profiler.dirpath="/content/wandb/latest-run/tbprofile"
trainer.fit(model, training_loader, validation_loader)
# trace_files = glob.glob("/content/lightning_logs/*.pt.trace.json")
trace_files = glob.glob("/content/wandb/latest-run/tbprofile/*.pt.trace.json")
for i, trace_file in enumerate(trace_files):
if "training_step" in trace_file:
profile_art = wandb.Artifact(f"train-trace{i}-{wandb.run.id}", type="profile")
profile_art.add_file(trace_file, "train_trace.pt.trace.json")
else:
profile_art = wandb.Artifact(f"validation-trace{i}-{wandb.run.id}", type="profile")
profile_art.add_file(trace_file, "validation_trace.pt.trace.json")
wandb.log_artifact(profile_art)
wandb.finish()
Which can be used to render the trace in the UI, and from there, you can share via the UI itself, or incorporate the trace into your reports as needed, also you can make available system usage

Run set
0

Check out the report below for more detail


Artifact Tracking and Versioning



Artifacts are inputs and outputs of each part of your machine learning pipeline, namely datasets and models. Training datasets change over time as new data is collected, removed, or re-labeled, models change with new architectures being implemented along with continuous re-retraining. With these changes, all downstream tasks utilizing the changed datasets and models will be affected and understanding this dependency chain is critical for debugging effectively. W&B can log this dependency graph easily with a few lines of code.
Let's say we have a directory "sample_images" which stores a set of images and labels in our local development environment
import wandb

with wandb.init(project="my_project", job_type="model_training") as run:
# Create Artifact
training_images = wandb.Artifact(name='training_images', type="training_data")
# Add serialized data
training_images.add_dir('sample_images')
# Log to W&B, automatic versioning
wandb.log_artifact(training_images)
The "sample_images" directory more often exists in some cloud object store like s3 or remote file system, in which case W&B can track references to the respective artifacts. In this case, W&B will still automatically version and provide durable URI's and user-defined aliases to the underlying artifacts.
# Log by reference if data sits in cloud object stores or remote files systems
with wandb.init(project="my_project", job_type="model_training") as run:
# Create Artifact
training_images = wandb.Artifact(name='training_images_reference', type="training_data")
# Add reference to data, store only the metadata associated with artifact
training_images.add_reference(uri='file:///content/sample_images')
# Log to W&B, automatic versioning
wandb.log_artifact(training_images)
Once an artifact is logged, other users can download and inspect the artifact using the Python sdk or W&B CLI.
import wandb
run = wandb.init()
artifact = run.use_artifact('tim-w/credit_scorecard/decent-puddle-21_model.json:v0', type='training_data')
artifact_dir = artifact.download()
With W&B Artifacts we obtain a complete picture of a machine learning pipeline so we can better understand how and where issues arise and isolate the problem area. Below is an example of a model that was logged to W&B.

decent-puddle-21_model.json
Artifact overview
Type
model
Created At
April 12th, 2022
Description
Versions
Version
Aliases
Logged By
Tags
Created
TTL Remaining
# of Consuming Runs
Size
m.learner
m.version
0
latest
credit
moved-validation
v0
Tue Apr 12 2022
Inactive
0
33.4kB
objective
name
binary:logistic
reg_loss_param
scale_pos_weight
1
generic_param
seed
42
gpu_id
-1
n_gpus
0
n_jobs
24
nthread
24
random_state
42
gpu_page_size
0
seed_per_iteration
0
validate_parameters
1
fail_on_invalid_gpu_id
0
enable_experimental_json_serialization
1
gradient_booster
name
gbtree
updater
grow_quantile_histmaker
train_param
eta
0.100000001
alpha
0
gamma
1
lambda
0
max_bin
256
cache_opt
1
max_depth
3
reg_alpha
0
subsample
1
max_leaves
0
reg_lambda
0
sketch_eps
0.0299999993
grow_policy
depthwise
refresh_leaf
1
sketch_ratio
2
learning_rate
0.100000001
max_delta_step
0
min_split_loss
1
sampling_method
uniform
split_evaluator
elastic_net,monotonic
colsample_bynode
1
colsample_bytree
1
max_search_group
100
min_child_weight
100
sparse_threshold
0.20000000000000001
colsample_bylevel
1
default_direction
learn
max_conflict_rate
0
monotone_constraints
()
enable_feature_grouping
0
interaction_constraints
cpu_hist_train_param
single_precision_histogram
0
specified_updater
false
gbtree_train_param
updater
grow_quantile_histmaker
predictor
auto
tree_method
hist
updater_seq
grow_quantile_histmaker
process_type
default
num_parallel_tree
1
learner_model_param
num_class
0
base_score
2.1707E-1
num_feature
39
learner_train_param
dsplit
auto
booster
gbtree
objective
binary:logistic
disable_default_eval_metric
0
metrics
List<string>
Loading...

Interactive Tables

W&B Tables enable a granular analysis of predictions and results through tabular data manipulation. Oftentimes, understanding a model's behavior during or after training requires more than seeing a clean loss curve go down and to the right. We need to understand where specifically the model fails, what examples are giving it trouble, where we might need to collect more training data/re-label, or maybe even uncover more nuanced errors like numerical instability.
Tables can be used as a model evaluation store, which stores consolidated results on golden validation datasets across different trained models in your project. They can also be used as model leaderboards, where each row is a model class or architecture with embedded explainability or custom performance charts alongside them. These are both best practices which you can start incorporating with a few lines of code.

id
image
guess
truth
score_Amphibia
score_Animalia
score_Arachnida
score_Aves
score_Fungi
score_Insecta
score_Mammalia
score_Mollusca
score_Plantae
score_Reptilia
1
2
3
4
5


Hyperparameter Sweeps

One of the more tedious aspects of training deep learning models is tuning hyper-parameters. When we log runs in W&B, we can make W&B aware of hyper-parameters. A central sweep controller can then delegate new hyper-parameter combinations based on a set of distributions we specify across the hyper-parameter space through a .yaml file. If we do a Bayes search, W&B can even seed the search with previous runs we've already logged. Below is an example where we have a simple training function which exposes several hyper-parameters to W&B via wandb.config. W&B sweep then initializes a hyper-parameter search using the dictionary of distributions for the hyper-parameter space.
def train(config=None):
# Initialize a new wandb run
with wandb.init(config=config):
# If called by wandb.agent, as below,
# this config will be set by Sweep Controller
config = wandb.config

loader = build_dataset(config.batch_size)
network = build_network(config.fc_layer_size, config.dropout)
optimizer = build_optimizer(network, config.optimizer, config.learning_rate)

for epoch in range(config.epochs):
avg_loss = train_epoch(network, loader, optimizer)
wandb.log({"loss": avg_loss, "epoch": epoch})

sweep_config = {
'method': 'random',
# 'method': 'grid',
# 'method': 'bayes',
}

parameters_dict = {
'optimizer': {
'values': ['adam', 'sgd']
},
'fc_layer_size': {
'values': [128, 256, 512, 1024]
},
'dropout': {
'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
},
# Static hyperparameter, notice singular key
'epochs': {
'value': 10
},
'learning_rate': {
# Flat distribution between 0 and 0.1
'distribution': 'uniform',
'min': 0,
'max': 0.25
},
'batch_size': {
'distribution': 'q_log_uniform',
'q': 1,
'min': math.log(32),
'max': math.log(256),
}
}

### Initializes the central sweep server
sweep_config['parameters'] = parameters_dict
sweep_id = wandb.sweep(sweep_config, project="sweeps-demo-pytorch")

### Run this in multiple machines/cores to distribute the hyperparameter search
wandb.agent(sweep_id, train, count=10)


W&B will automatically separate the runs associated with a sweep and create some charts automatically that allow us to do more meta-analysis on which combinations are working well.

Sweep: 0ewut602 1
10
Sweep: 0ewut602 2
0



Tensorboard + W&B Sweeps!

There are some very simple ways to explore sweeps without users having to refactor your entire code base. Users that utilize Tensorboard the track a training experiment can take advantage of W&B ability to sync_tensorboard to W&B.
import torch
import wandb
from torch.utils.tensorboard import SummaryWriter

sweep_config = {
'method': 'grid',
'metric': {
'name': 'Loss/train', ## matches what i write via SummaryWriter
'goal': 'minimize'
},
'early_terminate':{
'type': 'hyperband',
'min_iter': 5
},
'parameters': {
'learning_rate':{
'values': [0.01, 0.005, 0.001, 0.0005, 0.0001]
}
}
}

x = torch.arange(-5, 5, 0.1).view(-1, 1)
y = -5 * x + 0.1 * torch.randn(x.size())

wandb.login()
wandb.tensorboard.patch(root_logdir = "/content/runs"). ## since i don't have tensorboard running, i set the root logging directory manually

def sweep_train(config_defaults = None):
config_defaults = { "learning_rate": 0.01 } ## will be overwritten during sweeps
run = wandb.init(config=config_defaults, sync_tensorboard=True) # the config gets over-written in the Sweep
model = torch.nn.Linear(1, 1)
writer = SummaryWriter(log_dir = "/content/runs")
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = wandb.config.learning_rate)
for epoch in range(5):
y1 = model(x)
loss = criterion(y1, y)
writer.add_scalar("Loss/train", loss, epoch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
writer.flush()
writer.close()
run.finish()

sweep_id = wandb.sweep(sweep_config, project="sweeps-pytorch-tensorboard-v4")
wandb.agent(sweep_id, function=sweep_train)

SageMaker + W&B Sweeps

Certinaly the example above is useful, but probably not how you would want to execute HPO. You can leverage W&B as a controller to execute HPO on Sagemaker! See the blog and code
SageMaker + W&B SweepsIf you can imagine, the sweep_train function is your original training method, with two additional lines of code you can start a W&B sweeps run! One of those lines initiates a run (syncing to tensorboard) and the other finishes out the run.
SageMaker will spin up an AWS instance for each hyperparameter value and train the model. W&B tracks everything that happens and makes it easy to visualize the sweep. Below we are capturing information across a hyperparameter search involving 100 W&B runs. We can visualize and focus our attention on parts of the tuning that were more performant than others and have simple means to compare all the runs. The test accuracy ranges from 10 - 76.45%, depending on the hyperparameters.


Run set
100


Reports

W&B reports help contextualize and document the system of record built through logging diagnostics and results from different pieces of your pipeline. Reports are interactive and dynamic, reflecting filtered run sets logged in W&B. You can add all sorts of assets to a report; the one you are reading now includes plots, tables, images, code, and nested reports.
Whether you are writing technical summaries, regulatory documentation, or just want a real-time dashboard reflecting the progress of your team, reports can be a best practice documentation layer for your data science and machine learning projects. Check out the below gallery for some interesting ideas:


Going Beyond the Core W&B Primitives

wandb.log, wandb.Artifact, wandb.Table, and wandb.sweep can take you far in building your machine learning system of record, forming the core of some best practices we see top machine learning research teams employ in their everyday workflows. Beyond these primitives, our team continues to build out integrations with higher level frameworks and tools whereby simply adding a single W&B callback or function argument causes everything to be automatically logged under the hood. Check out our integrations page and double check the docs of your favorite machine learning repo as their might be already be a W&B integration in place! Let us know if you'd like to see W&B integrated in a package or tool we aren't yet logging!
Be sure to check our our github repo, which is crammed full of examples of W&B usage
Tim Whittaker
Tim Whittaker •  
Now, suppose this engineer has developed a promising model. They make being to contextualize the experiments by way of reports, leveraging EVERYTHING that
Reply
artifact
List<Maybe<File<(table)>>>