Skip to main content

Beyond experiment tracking and best practice of wandb

This reports shows some advanced tips to use wandb
Created on December 3|Last edited on December 12
`In this report, the features of wandb that are conveniently usable beyond simple experiment management will be introduced. For those who started to use wandb, please refer "For those who started to use W&B" that contains a list of learning assets of wandb before this report.





Experiments

How to collaborate with your teammate

Experiments in W&B are managed in a hierarchy of Entity => Project => Run. An entity represents a team unit. By default, you have a personal entity, but you can create a team entity and manage the same project as a team. However, for individual or academic use, you can only join one additional entity besides your personal one. Below the entity level, there are projects. As the name suggests, use it for a single ML or DL project. You'll need to conduct many experiments within that project, but Runs are managed under the project. Note that entities and projects are manually created, while Runs are automatically created with each execution.


Arguments which can be used when calling wandb.init()

Preventing forgetfulness of executing wandb.finish() using a with statement

wandb.summary() for explicit logging

Environment variables setting

Alert notification via email or slack

Artifacts

Basics of logging data version on wandb

Use W&B Artifacts to track and version any serialized data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and trained model as output. In addition to logging hyper-parameters and metadata to a run, you can use an artifact to log the dataset used to train the model as input and the resulting model checkpoints as outputs. You will always be able answer the question “what version of my dataset was this model trained on". In summary, with W&B Artifacts, you can:
The diagram below demonstrates how you can use artifacts throughout your entire ML workflow; as inputs and outputs of runs.

Basic usage
The following code is the basic usage of Artifacts.
with wandb.init(project="artifacts-example", job_type="add-dataset") as run:
# Create an artifact object with the wandb.Artifact API.
artifact = wandb.Artifact(name="my_data", type="dataset")
# Add one or more files, such as a model file or dataset, to your artifact object.
artifact.add_dir(local_path="./dataset.h5") # Add dataset directory to artifact
# log your artifact to W&B.
run.log_artifact(artifact) # Logs the artifact version "my_data:v0"
One of the powerful feature of Artiacts is lineage. If you use wandb Artifacts, a lineage graph is automatically created. So, you can easily understand which model used which dataset.

Some advanced features of Artifacts are introduced in the following sections.

With Reference artifacts, you don't need to upload your data to Wandb!

You may already have large datasets sitting in a cloud object store like s3 and just want to track what versions of those datasets Runs are utilizing and any other metadata associated with those datasets. You can do so by logging these artifacts by reference, in which case W&B only tracks the checksums and metadata of an artifact and does not copy the entire data asset to W&B.
Assume we have a bucket with the following structure:
s3://my-bucket
+-- datasets/
| +-- mnist/
+-- models/
+-- cnn/
Under mnist, we have our dataset, a collection of images. Lets track it with an artifact:
import wandb

run = wandb.init()
artifact = wandb.Artifact("mnist", type="dataset")
artifact.add_reference("s3://my-bucket/datasets/mnist")
run.log_artifact(artifact)
You can use the artifact with the following code.
import wandb

run = wandb.init()
artifact = run.use_artifact("mnist:latest", type="dataset")
artifact_dir = artifact.download()
W&B Artifacts support any Amazon S3 compatible interface — including MinIO.
💡
If you learn more about reference artifacts, please check the official documentation and the following report.


Adding New Versions and Automatically Removing Duplicates

Adding new versions of an artifact is very simple; as long as the same artifact name is used, versions are automatically managed as v1, v2, etc.

First, let's look at the simplest method. In the following method, a new version of the artifact is logged in a run that handles all files within the artifact.
with wandb.init() as run:
artifact = wandb.Artifact("artifact_name", "artifact_type")

# Add Files and Assets to the artifact using
# `.add`, `.add_file`, `.add_dir`, and `.add_reference`
artifact.add_file("image1.png")
run.log_artifact(artifact)
There is a way to register only the differences from the previous version, instead of saving all files. When adding, changing, or deleting some files from the previous artifact version, it is not necessary to re-index the unchanged files. When adding, changing, or deleting some files from the previous version, a new artifact version is created as an incremental artifact.


Below is the workflow for incremental artifacts. For more details, check the official documentation.
with wandb.init(job_type="modify dataset") as run:
saved_artifact = run.use_artifact(
"my_artifact:latest"
) # fetch artifact and input it into your run
draft_artifact = saved_artifact.new_draft() # create a draft version

# modify a subset of files in the draft version
draft_artifact.add_file("file_to_add.txt")
draft_artifact.remove("dir_to_remove/")
run.log_artifact(
artifact
) # log your changes to create a new version and mark it as output to your run


Artifacts in multiple runs

Table

Basics of visualizing data on wandb

Use W&B Tables to visualize and query tabular data. For example:
  • Compare how different models perform on the same test set
  • Identify patterns in your data
  • Look at sample model predictions visually
  • Query to find commonly misclassified examples
The following table shows a table with semantic segmentation and custom metrics. This sample project is from the W&B ML Course. Please click images and see how you can interactively change the visualization!

Run set
25


Basic usage
A Table is a two-dimensional grid of data where each column has a single type of data. Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types.
import wandb

with wandb.init(project="table-test") as run:
# wandb.Table(): Create a new table object.
my_table = wandb.Table(columns=["a", "b"], data=[["a1", "b1"], ["a2", "b2"]])
# run.log(): Log the table to save it to W&B.
run.log({"Table Name": my_table})


Put multi media types (image, audio, video, and so on) on wandb and the code examples

Filter, groupby, sort and so on, and custom query (weave) in table!

You can do filtering, grouping, sorting, and so on interactively in wandb Tables. In the following example, distribution differences of variables across targets are investigated.
And you can write a query on wandb table. You can learn the list of query here.

Run set
40



Save table as Artifacts

Report

Best practice to share your results

How to put different projects' results into a single report

Sharing, permission management, and collaboration (Comments!)

Automated report creation using Python API



Ending / Other functions!

In addition, Weights & Biases offers the following features. Please check our official documentation or courses if you want to learn more!

Sweeps

Models Registries

Launch

Automations

Traces

Weave

Monitoring

Kei Kamata
Kei Kamata •  
Yuya Yamamoto [Comment example] please check this report.
Reply