hLevel PoC Guide
One stop shop for everything you need to test out during the W&B Pilot.
Created on May 5|Last edited on July 14
Comment
💡
💡
Weights and Biases (W&B) 💫PoC Workshop Sessions Quick Documentations LinksW&B Installation & AuthenticationTrack any python process or experiment with W&B's Experiment Tracking 🍽What Can I log and How do I log it? Scalar MetricsDistributed TrainingVisualize and query dataframes via W&B TablesTrack and version any serialized data via W&B Artifacts Tracking and VersioningHouse staged/candidate models via W&B's RegistryTune Hyperparameters via W&B SweepsOrganize visualizations and share your findings with collaborators via W&B ReportsTrack and evaluate GenAI applications via W&B Weave Other Useful ResourcesImport/Export APISlack AlertsFAQsW&B ModelsW&B Weave
Weights and Biases (W&B) 💫
Weights and Biases is a ML Ops platform built to facilitate collaboration and reproducibility across the machine learning development lifecycle. Machine learning projects can quickly become a mess without some best practices in place to aid developers and scientists as they iterate on models and move them to production.
W&B is lightweight enough to work with whatever framework or platform teams are currently using, but enables teams to quickly start logging their important results to a central system of record. On top of this system of record, W&B has built visualization, automation, and documentation capabilities for better debugging, model tuning, and project management.
PoC Workshop Sessions
Date | Time | Session | Recording Link | Topics Discussed |
---|---|---|---|---|
May 19th, 2025 | 10am PST | W&B 101 | Recording link | W&B Getting Started, Experiment Tracking |
Office hours | Recording link | Session 2 |
Quick Documentations Links
- Outerbounds is based on Metaflow, here is the integration docs :
- Track and version datasets, models, and other artifacts: https://docs.wandb.ai/guides/artifacts (Code)
- Visualize your data with interactive tables: https://docs.wandb.ai/guides/data-vis/tables-quickstart (Code)
W&B Installation & Authentication
To start using W&B, you first need to install the Python package (if it's not already there)
pip install wandb
Once it's installed, authenticate your user account by logging in through the CLI or SDK. You should have receive an email to sign up to the platform, after which you can obtain your API token (The API token is in your "Settings" section under your profile)
wandb login --host <YOUR W&B HOST URL> <YOUR API TOKEN>
OR through Python:
Once you are logged in, you are ready to track your workflows!
Track any python process or experiment with W&B's Experiment Tracking 🍽
At the core of W&B is a Run, which is a logged unit of execution of Python code. A Run captures the entire execution context of that unit: Python library versions, hardware info, system metrics, git state, etc.. To create a run, call wandb.init(). There are a bunch of important arguments you can pass to wandb.init() to provide additional context for the run and enable you to organize your runs later:
import wandbwandb.init(project="my-sample-project",entity="<enter team name>", # Teamgroup='my_group', # for organizing runs (e.g. distributed training)job_type='training', # for organizing runs (e.g. preprocessing vs. training)config={'hyperparam1': 24, # Hyperparams and other config'hyperparam2': 'resnet'})
What Can I log and How do I log it?
Within a run context, you can log all sorts of useful info such as metrics, visualizations, charts, and interactive data tables explicitly with wandb.log. Here is a comprehensive guide of wandb.log and its api docs.
Scalar Metrics
Distributed Training
W&B supports logging distributed training experiments. In distributed training, models are trained using multiple GPUs in parallel. W&B supports two patterns to track distributed training experiments:
- One process: Initialize W&B (wandb.init) and log experiments (wandb.log) from a single process. This is a common solution for logging distributed training experiments with the PyTorch Distributed Data Parallel (DDP) Class. In some cases, users funnel data over from other processes using a multiprocessing queue (or another communication primitive) to the main logging process.
- Many processes: Initialize W&B (wandb.init) and log experiments (wandb.log) in every process. Each process is effectively a separate experiment. Use the group parameter when you initialize W&B (wandb.init(group='group-name')) to define a shared experiment and group the logged values together in the W&B App UI.
Visualize and query dataframes via W&B Tables
Track and version any serialized data via W&B Artifacts Tracking and Versioning
House staged/candidate models via W&B's Registry
Tune Hyperparameters via W&B Sweeps
Organize visualizations and share your findings with collaborators via W&B Reports
Track and evaluate GenAI applications via W&B Weave
Other Useful Resources
Import/Export API
All data logged to W&B can be accessed programmatically through the import/export API (also called the public API). This enables you to pull down run and artifact data, filter and manipulate it how you please in Python.
Slack Alerts
You can set slack alerts within a run to trigger when things happen in your training / evaluation scripts. For example, you may want to notify you when training is done or when a metric exceeds a certain value.
FAQs
W&B Models
1. I didn't name my run. Where is the run name coming from?
If you do not explicitly name your run, a random run name will be assigned to the run to help identify the run in the UI. For instance, random run names will look like "pleasant-flower-4" or "misunderstood-glade-2".
2. How can I configure the name of the run in my training code?
At the top of your training script when you call wandb.init, pass in an experiment name, like this:
wandb.init(name="my_awesome_run")
3. If wandb crashes, will it possibly crash my training run?
It is extremely important to us that we never interfere with your training runs. We run wandb in a separate process to make sure that if wandb somehow crashes, your training will nevertheless continue to run.
4. Why is a run marked crashed in W&B when it’s training fine locally?
This is likely a connection problem — if your server loses internet access and data stops syncing to W&B, we mark the run as crashed after a short period of retrying.
5. Does W&B support Distributed training?
Yes, W&B supports distributed training, here's the detailed guide on how to log distributed training experiments.
6. Can I use PyTorch profiler with W&B?
Here's a detailed report that walks through using the PyTorch profiler with W&B along with this associated Colab notebook.
7. How do I stop wandb from writing to my terminal or my jupyter notebook output?
Set the environment variable WANDB_SILENT to true.
In Python
os.environ["WANDB_SILENT"] = "true"
Within Jupyter Notebook
%env WANDB_SILENT=true
With Command Line
WANDB_SILENT=true
W&B Weave
1. How can I add a custom cost for my GenAI model?
You can add a custom cost by using the add_cost method. This guide walks you through the steps of adding a custom cost. Additionally we also have this cookbook on Setting up a custom cost model with associated notebook.
2. How can I create my own custom Scorers with W&B Weave?
W&B Weave has it's own predefined scorers that you use as well as create your own Scorers. This documentation walks through creating your own scorers with W&B Weave
3. Can I control/customize the data that is logged?
Yes, If you want to change the data that is logged to weave without modifying the original function (e.g. to hide sensitive data), you can pass postprocess_inputs and postprocess_output to the op decorator.
Add a comment