Skip to main content

W&B Models Product Features - Leonardo.AI

Created on December 3|Last edited on December 3
Hi Leonardo.AI team, we are pleased to present to you the following W&B Features that you may find helpful for your existing workflows.


Sweeps

W&B provides a mechanism for automating hyper-parameter search through W&B Sweeps. Sweeps allows you to configure a large set of experiments across a pre-specified hyper-parameter space. To implement a sweep you just need to:
  1. Add wandb.init() to your training script, ensuring that all hyper-parameters are passed to your training logic via wandb.config.
  2. Write a yaml file with your hyper-parameter search specified i.e. method of search, hyper-parameter distributions and values to search over.
  3. Run the sweep controller, which runs in W&B through wandb.sweep or through the UI. The controller will delegate new hyperparameter values to wandb.config of the various agents running.
  4. Run agents in however many machines you want to run the experiments with wandb.agent
The agents will execute the training script replacing the wandb.config with queued hyper-parameter values the controller is keeping track of. A representation of this is below!

Run set
50




Artifact Tracking and Versioning

Artifacts enable you to track and version any serialized data as the inputs and outputs of runs. This can be datasets (e.g. image files), evaluation results (e.g. heatmaps), or model checkpoints. W&B is agnostic to the formats or structure of the data you want to log as an artifact.


Logging Artifacts

To log an artifact, you first create an Artifact object with a name , type, and optionally description and metadata dictionary. You can then add any of these to the artifact object:
  • local files
  • local directories
  • wandb Data Types (e.g. wandb.Plotly or wandb.Tables) which will render alongside the artifact in the UI
  • remote files and directories (e.g. s3 buckets)
# 1. Log a dataset version as an artifact
import wandb
import os

# Initialize a new W&B run to track this job
run = wandb.init(project="artifacts-quickstart", job_type="dataset-creation")

# Create a sample dataset to log as an artifact
f = open('my-dataset.txt', 'w')
f.write('Imagine this is a big dataset.')
f.close()

# Create a new artifact, which is a sample dataset
dataset = wandb.Artifact('my-dataset', type='dataset')
# Add files to the artifact, in this case a simple text file
dataset.add_file('my-dataset.txt')
# Log the artifact to save it as an output of this run
run.log_artifact(dataset)

wandb.finish()
Each time you log this artifact, W&B will checksum the file assets you add to it and compare that to previous versions of the artifact. If there is a difference, a new version will be created, indicated by the alias v1 , v2, v3, etc. Users can optionally add/subtract additional aliases through the UI or API. Aliases are important because they uniquely identify an artifact version, so you can use them to pull down your best model for example.

Error: Could not load

Consuming Artifacts

To consume an artifact, execute the following:
import wandb
run = wandb.init()
# Indicate we are using a dependency
artifact = run.use_artifact('dummy-team/that_was_easy/my-dataset:v3', type='dataset')
artifact_dir = artifact.download()

Tracking Artifacts By Reference

You may already have large datasets sitting in a cloud object store like s3 and just want to track what versions of those datasets Runs are utilizing and any other metadata associated with those datasets. You can do so by logging these artifacts by reference, in which case W&B only tracks the checksums and metadata of an artifact and does not copy the entire data asset to W&B. Here are some more details on tracking artifacts by reference.
With artifacts you can now refer to arbitrary data assets through durable and simple names and aliases (similar to how you deal with Docker containers). This makes it really easy to hand off these assets between people and processes and see the lineage of all data, models, and results.
If you're working with multiple component artifacts and would like to track the lineage of the collection of component artifacts in the form of a 'super artifact' - check out this colab here.



Registry

W&B Registry is a curated central repository that stores and provides versioning, aliases, lineage tracking, and governance of assets. Registry allows individuals and teams across the entire organization to share and collaboratively manage the lifecycle of all models, datasets and other artifacts. The registry can be access directly in SaaS by visiting https://wandb.ai/registry or on your private instance through <host-url>/registry
W&B Registry home page

Registry Types

W&B supports two types of registries: Core registries and Custom registries.
Core registry
A core registry is a template for specific use cases: Models and Datasets.
By default, the Models registry is configured to accept "model" artifact types and the Dataset registry is configured to accept "dataset" artifact types.
Custom registry
Custom registries are not restricted to "model" artifact types or "dataset" artifact types and can be any user defined type
After creating a registry types, you store individual collections of your assets for tracking.
Collection
A collection is a set of linked artifact versions in a registry. Each collection represents a distinct task or use case and serves as a container for a curated selection of artifact versions related to that task.
Below is an diagram demonstrating the structure of how the registry integrates with your existing organization, teams, and projects


Creating a Collection

Collections can be created programmatically or directly through the UI. Below, we'll cover programmatic creation. For the manual creation process through the UI, visit the Interactively create a collection section in the W&B docs.
W&B automatically creates a collection with the name you specify in the target path if you try to link an artifact to a collection that does not exist. The target path consists of the entity of the organization, the prefix "wandb-registry-", the name of the registry, and the name of the collection:
f"{org_entity}/wandb-registry-{registry_name}/{collection_name}"
The proceeding code snippet shows how to programmatically create a collection. Replace values enclosed in <> with your own:
import wandb

# Initialize a run
run = wandb.init(entity="<team_entity>", project="<project>")

# Create an artifact object
artifact = wandb.Artifact(name="<artifact_name>", type="<artifact_type>")

# Define required registry definitions
org_entity = "<organization_entity>"
registry_name = "<registry_name>"
collection_name = "<collection_name>"
target_path = f"{org_entity}/wandb-registry-{registry_name}/{collection_name}"

# Link the artifact to a collection
run.link_artifact(artifact = artifact, target_path = target_path)

run.finish()
After creating your registry collection, you can programmatically link artifact versions to the registry. Linking an artifact to a registry collection brings that artifact version from a private, project-level scope, to the shared organization level scope.
Linking artifacts to a registry can be done programmatically or directly through the UI. Below, we'll cover programmatic linking. For the manual creation process through the UI, visit the "Registry App" and "Artifact browser" tabs of the How to link an artifact version section in the W&B docs.
Before you link an artifact to a collection, ensure that the registry that the collection belongs to already exists.
he target_path parameter to specify the collection and registry you want to link the artifact version to. The target path consists of:
{ORG_ENTITY_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}
Copy and paste the code snippet below to link an artifact version to a collection within an existing registry. Replace values enclosed in <> with your own:
import wandb
#Define team and org
TEAM_ENTITY_NAME = "<team_entity_name>"
ORG_ENTITY_NAME = "<org_entity_name>"

REGISTRY_NAME = "<registry_name>"
COLLECTION_NAME = "<collection_name>"

run = wandb.init(
entity=TEAM_ENTITY_NAME, project="<project_name>")

artifact = wandb.Artifact(name="<artifact_name>", type="<collection_type>")
artifact.add_file(local_path="<local_path_to_artifact>")

target_path=f"{ORG_ENTITY_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}"
run.link_artifact(artifact = artifact, target_path = target_path)

Download and use an artifact from a registry

Use the W&B Python SDK to use and download an artifact that you linked to the W&B Registry.
Replace values within <> with your own:
import wandb

ORG_ENTITY_NAME = '<org-entity-name>'
REGISTRY_NAME = '<registry-name>'
COLLECTION_NAME = '<collection-name>'
ALIAS = '<artifact-alias>'
INDEX = '<artifact-index>'

run = wandb.init() # Optionally use the entity, project arguments to specify where the run should be created

registered_artifact_name = f"{ORG_ENTITY_NAME}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:{ALIAS}"
registered_artifact = run.use_artifact(artifact_or_name=name) # marks this artifact as an input to your run
artifact_dir = registered_artifact.download()
Reference an artifact version with one of following formats listed:
# Artifact name with version index specified
f"{ORG_ENTITY}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{INDEX}"

# Artifact name with alias specified
f"{ORG_ENTITY}/wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:{ALIAS}"
Where:
latest - Use latest alias to specify the version that is most recently linked.
v# - Use v0, v1, v2, and so on to fetch a specific version in the collection.
alias - Specify the custom alias attached to the artifact version



Panel Management for W&B Models 📈


What is it? Now you can easily declutter your workspace by removing unnecessary panels and adding back only the metrics and keys that matter most to you.
Why you'll love it:
Enhanced Usability: Quickly remove automatically added panels while keeping your custom ones intact.
Simplified Focus: Tailor your workspace to display only the data that's relevant to you.
Improved Performance: Reduce data load for a faster, smoother experience—especially beneficial for large workspaces.
🔧 Pro Tip: We highly recommend using Panel Management to maintain optimal workspace performance for large workspaces.



Saved Views in Workspaces

In large workspaces (many runs, many metrics, many steps), it can be tedious to filter & group & toggle the visibility of runs to display repeatedly. Create a Saved View to organize your preferred setup of charts and data, so you won't have to redo any of the filtering, grouping, or visibility operations when loading your workspace. For more details, read the documentation here

Create a new saved workspace view

  1. Navigate to a personal workspace or a saved view.
2. Make edits to the workspace.
3. Click on the meatball menu (three horizontal dots) at the top right corner of your workspace. Click on Save as a new view.
New saved views appear in the workspace navigation menu:





Greater Searchability of Runs/Panels 🔎

What is it? We updated our search bars to now remember your most recent run and panel search strings and provide type-ahead auto completion when you focus on these search bars.

artifact