Skip to main content

How to set up Launch

Created on April 25|Last edited on June 10
Please refer our official documentation to get more information.

What is Launch?

Easily scale training runs from your desktop to a compute resource like Amazon SageMaker, Kubernetes and more with W&B Launch. Once W&B Launch is configured, you can quickly run training scripts, model evaluation suites, prepare models for production inference, and more with a few clicks and commands.

Launch is composed of three fundamental components: launch jobs, queues, and agents. A launch job is a blueprint for configuring and running tasks in your ML workflow. Once you have a launch job, you can add it to a launch queue. A launch queue is a first-in, first-out (FIFO) queue where you can configure and submit your jobs to a particular compute target resource, such as Amazon SageMaker or a Kubernetes cluster.
As jobs are added to the queue, one or more launch agents will poll that queue and execute the job on the system targeted by the queue. (as a docker image)


Step1: Create job

A job is a blueprint that contains contextual information about a W&B run it is created from; such as the run's source code, software dependencies, hyperparameters, artifact version, and so forth.
There are mainly three methods to create a job.

Artifact-based (or code-based) jobs [easy]

Definition: Code and other assets are saved as a W&B artifact.
Python SDK
<train.py>
import wandb

config = {"epochs": 10}

entity = "<your entity>"
project = "launch-quickstart"
job_name = "walkthrough_example"

settings = wandb.Settings(job_name=job_name)

with wandb.init(
entity=entity, config=config, project=project, settings=settings
) as run:
config = wandb.config
for epoch in range(1, config.epochs):
loss = config.epochs / epoch
accuracy = (1 + (epoch / config.epochs)) / 2
wandb.log({"loss": loss, "accuracy": accuracy, "epoch": epoch})

wandb.run.log_code()
python train.py
CLI
wandb job create --project "<project-name>" -e "<your-entity>" \
--name "<name-for-job>" code "<path-to-script/code.py>"
Ensure the path with your Python script has a requirements.txt file with the Python dependencies required to run your code. A Python runtime is also required. The python runtime can either be specified manually with the runtime parameter or can be auto-detected from a runtime.txt or .python-version file.
💡
You can specify a name for your job with the WANDB_JOB_NAME environment variable. You can also specify a name by setting the job_name parameter in wandb.Settings and passing it to wandb.init. For example:
settings = wandb.Settings(job_name="my-job-name")
wandb.init(settings=settings)
If you do not specify a name, W&B automatically generate a launch job name for you. The job name is formatted as follows: job-<code-artifact-name>.

Image-based jobs

Definition: Code and other assets are baked into a Docker image.
To create an image-based job, you must first create the Docker image. The Docker image should contain the source code (such as the Dockerfile, requirements.txt file, and so on) required to execute the W&B run.
You can create a Docker image called fashion-mnist with the docker build command:
docker build . -t fashion-mnist
CLI
wandb job create --project "<project-name>" --entity "<your-entity>" \
--name "<name-for-job>" image image-name:tag
or
Docker run
docker run -e WANDB_PROJECT="<project-name>" \
-e WANDB_ENTITY="<your-entity>" \
-e WANDB_API_KEY="<your-w&B-api-key>" \
-e WANDB_DOCKER="<docker-image-name>" image:tag

Git-based jobs or Local (sample code)

Definition: Code and other assets are cloned from a certain commit, branch, or tag in a git repository.
wandb job create \
-p $WB_PROJECT \
-e $WB_ENTITY \
-n "First_Git_Job" git https://github.com/wandb/launch-jobs/tree/main/jobs/fashion_mnist_train \
-E 'job.py'

Step2: Add your launch job to a queue

This page shows a basic flow of creating a queue step-by-step.
If you use Docker for Launch set up and GPU, you need to specify "gpu": "all" in config. For details, please refer each Launch set up option.
💡

Docker

When you use Docker with W&B Launch, W&B will first build an image, and then build and run a container from that image. The image is built with the Docker docker run <image-uri> command. The queue configuration is interpreted as additional arguments that are passed to the docker run command.
This set up is common for users who perform experiments on their local machine, or that have a remote machine that they SSH in to, to submit launch jobs.
💡
"gpu":"all" tends to be a pitfall.
💡

Sagemaker

You can use W&B Launch to submit launch jobs to Amazon SageMaker to train machine learning models using provided or custom algorithms on the SageMaker platform. SageMaker takes care of spinning up and releasing compute resources, so it can be a good choice for teams without an EKS cluster.
Launch jobs sent to a W&B Launch queue connected to Amazon SageMaker are executed as SageMaker Training Jobs with the CreateTrainingJob API. Use the launch queue configuration to control arguments sent to the CreateTrainingJob API.
Step:
    1. [EKS] For production workloads and for customers who already have an EKS cluster, W&B recommends deploying the Launch agent to the EKS cluster using this Helm chart.
    2. [EC2] For production workloads without an current EKS cluster, an EC2 instance is a good option. Though the launch agent instance will keep running all the time, the agent doesn't need more than a t2.micro sized EC2 instance which is relatively affordable.
    3. [Local] For experimental or solo use cases, running the Launch agent on your local machine can be a fast way to get started.


Vertex AI

There is also a way to set up Launch with Kubernates. Please check wandb document if you are interested in. The basic concept is similar with that of Sagemaker.

Kubernates

Step3: Set up Agent

In general, the command to start a launch agent is:
wandb launch-agent -e <entity-name> -q <queue-name>