Figma Onboarding Guide
Created on September 20|Last edited on September 20
Comment
For Any Questions, post them on the #wandb-figma slack channel
Weights and Biases (W&B) 💫Quick Documentations LinksW&B AuthenticationExperiment Tracking 🍽 W&B TablesArtifact Tracking and VersioningRegistryW&B SweepsW&B ReportsIntegrations Other Useful ResourcesFAQ
Weights and Biases (W&B) 💫
Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.
Here's an Youtube video on overview of Weights & Biases
Quick Documentations Links
W&B Authentication
SDK Installation and Login
To start using W&B, you first need to install the Python package (if it's not already there)
pip install wandb
Once it's installed, authenticate your user account by logging in through the CLI or SDK. You should have receive an email to sign up to the platform, after which you can obtain your API token
wandb login --host <YOUR W&B HOST URL> <YOUR API TOKEN>
OR through Python:
wandb.login(host=os.getenv("WANDB_BASE_URL"), key=os.getenv("WANDB_API_KEY"))
Once you are logged in, you are ready to track your workflows!
Experiment Tracking 🍽
At the core of W&B is a Run, which is a logged unit of execution of Python code. A Run captures the entire execution context of that unit: Python library versions, hardware info, system metrics, git state, etc.. To create a run, call wandb.init(). There are a bunch of important arguments you can pass to wandb.init() to provide additional context for the run and enable you to organize your runs later:
import wandbwandb.init(project="my-sample-project",entity="<enter team name>", # Teamgroup='my_group', # for organizing runs (e.g. distributed training)job_type='training', # for organizing runs (e.g. preprocessing vs. training)config={'hyperparam1': 24, # Hyperparams and other config'hyperparam2': 'resnet'})
What Can I log and How do I log it?
Within a run context, you can log all sorts of useful info such as metrics, visualizations, charts, and interactive data tables explicitly with wandb.log. Here is a comprehensive guide of wandb.log and its api docs.
Scalar Metrics
Scalar metrics can be logged by passing them in to wandb.log as a dictionary with a name.
wandb.log({"my_metric": some_scalar_value})
Each time wandb.log is called, that increments a variable W&B keeps track of called step. This is the (x-axis) you see with all the time-series charts. If you call wandb.log every epoch, then the step represents the epoch count, but you may be calling it other times in validation or testing loops, in which case the step is not as clear. To pass a step manually (simply add step = my_int_variable) to wandb.log. This can be important to getting your charts at the resolution you want.
In Pytorch Lightning modules, you may want to set step to trainer.global_step for example. It is recommended to pack as many metrics as you can into a single dictionary and logging them in one go vs. separate wandb.log calls, each of which increment the step.
Run set
You will notice that if you log a scalar metric multiple times in a run, it will appear as a line chart with the step as the x-axis, and it will also appear in the Runs Table. The entry in the Runs Table is the summary metric, which defaults to the last value logged during the course of the run. You can change this behavior by setting the summary metric in the run using run.summary["my_metric_name"]=some_value . This is useful if you want to compare runs according to different aggregations of a given metric (e.g. mean, max, min) as opposed to simply the last one.
wandb.init()for i in range(5):wandb.log({"my_metric": i})wandb.summary["my_metric"] = 2 # 2 instead of the default 4wandb.finish()
Rich Media (e.g. images)
W&B Tables
Artifact Tracking and Versioning
Registry
W&B Sweeps
Anything logged in wandb.config appears as a column in the runs table and is considered a hyperparameter in W&B. These hyperparameters can be viewed dynamically in a Parallel Coordinates Chart, which you can add and manipulate in a workspace. You can edit this chart to display different hyperparameters or different metrics. The lines in the chart are different runs which have "swept" through the hyperparameter space. You can also plot a parameter importance chart to get a sense of what hyper-paramaeters are most important or correlated with the target metric. These importances are calculated using a random forest trained in your browser! Here are docs on the Parallel Coordinates Plot and the Parameter Importance Plot
Run set
50
W&B provides a mechanism for automating hyper-parameter search through W&B Sweeps. Sweeps allows you to configure a large set of experiments across a pre-specified hyper-parameter space. To implement a sweep you just need to:
- Add wandb.init() to your training script, ensuring that all hyper-parameters are passed to your training logic via wandb.config.
- Write a yaml file with your hyper-parameter search specified i.e. method of search, hyper-parameter distributions and values to search over.
- Run the sweep controller, which runs in W&B through wandb.sweep or through the UI. The controller will delegate new hyperparameter values to wandb.config of the various agents running.
The agents will execute the training script replacing the wandb.config with queued hyper-parameter values the controller is keeping track of.
If you prefer to use other hyper-parameter optimization frameworks, W&B has integrations with RayTune, Optuna, among others.
W&B Reports
Reports are flexible documents you can build on top of your W&B projects. You can easily embed any asset (chart, artifact, table) logged in W&B into a report alongside markdown, LaTeX, code blocks, etc. You can created rich documentation from your logged assets without copy-pasting static figures into word docs or managing excel spreadsheets. Reports are live in that as new experiments run, they will update accordingly. This report you are viewing is a good example of what all you can put into them.
Programmatic Reports
W&B Workspace API 🚀
What is the W&B Workspace API? This update allows for the programmatic creation and manipulation of W&B Workspaces, enhancing users' workflow and productivity.
Highlights:
- Customizable Workspaces: Define, create, and customize workspaces with specific layouts, colors, and sections.
- Editable Views: Load, modify, and save changes to existing workspaces or create new views.
- Run Management: Programmatically filter, group, and sort runs, and customize their appearance.
For more details and code examples, check out the documentation and this tutorial. This is an optional Python package and can be installed with: pip install wandb[workspaces] using >=wandb v0.17.5+.
Integrations
PyTorch Lightning
Logging & Model Checkpointing in PyTorch Lightning
The WandbLogger in PyTorch Lightning integrates Weights and Biases for real-time experiment tracking, logging metrics, model checkpoints, and hyperparameters during the training process.
If you are using the WandbLogger with the PyTorch Lightning Trainer, the ModelCheckpoint Callback will automatically log model checkpoints to W&B. See more details in the PyTorch Lightning integration docs, and run a live example in this colab notebook.
W&B has many other integrations with frameworks like Keras and Hugging Face, which offer similar functionality.
from lightning.pytorch.loggers import WandbLoggerfrom lightning.pytorch import Trainerfrom lightning.pytorch.callbacks import ModelCheckpointcheckpoint_callback = ModelCheckpoint(monitor='val_accuracy', mode='max')wandb_logger = WandbLogger(project='MNIST', # group runs in "MNIST" projectentity='smle-demo',log_model='all') # log all new checkpoints during trainingtrainer = Trainer(logger=wandb_logger, # W&B integrationcallbacks=[log_predictions_callback, # logging of sample predictionscheckpoint_callback], # our model checkpoint callbackaccelerator="gpu", # use GPUmax_epochs=5)
Run set
1
Other Useful Resources
Import/Export API
All data logged to W&B can be accessed programmatically through the import/export API (also called the public API). This enables you to pull down run and artifact data, filter and manipulate it how you please in Python.
Slack Alerts
You can set slack alerts within a run to trigger when things happen in your training / evaluation scripts. For example, you may want to notify you when training is done or when a metric exceeds a certain value.
FAQ
1. Why can't I login to W&B?
A troubleshooting step we recommend when users are having trouble logging in is to reset the login credentials via the CLI:
- Run rm ~/.netrc in your terminal
- Run wandb login --relogin --host=https://api.wandb.ai
- Enter your API key when prompted
2. I didn't name my run. Where is the run name coming from?
Ans: If you do not explicitly name your run, a random run name will be assigned to the run to help identify the run in the UI. For instance, random run names will look like "pleasant-flower-4" or "misunderstood-glade-2".
3. How can I configure the name of the run in my training code?
Ans: At the top of your training script when you call wandb.init, pass in an experiment name, like this:
wandb.init(name="my_awesome_run")
4. If wandb crashes, will it possibly crash my training run?
Ans: It is extremely important to us that we never interfere with your training runs. We run wandb in a separate process to make sure that if wandb somehow crashes, your training will continue to run. If the internet goes out, wandb will continue to retry sending data to wandb.ai.
5. Why is a run marked crashed in W&B when it’s training fine locally?
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run as crashed after a short period of retrying.
6. How do I stop wandb from writing to my terminal or my jupyter notebook output?
Ans: Set the environment variable WANDB_SILENT to true.
In Python
os.environ["WANDB_SILENT"] = "true"
Within Jupyter Notebook
%env WANDB_SILENT=true
With Command Line
WANDB_SILENT=true
Add a comment