Introducing W&B Launch

We’re rolling out a big new addition to the Weights & Biases platform. Here’s what you need to know:
Created on March 14|Last edited on March 14
Comment
﻿
IntroductionLarger parameter models are becoming more widespread across every industry and organization doing machine learning today. As a result, model training is rapidly becoming more computationally intensive, and will have to move away from your local laptop into a remote compute environment with access to more or better GPUs. 
Our mission here at Weights & Biases is to build the best tools for machine learning teams. To learn what we should build next, we often try to embed ourselves with practitioner teams at different organizations, all with different workflows and structures. And one common theme has emerged recently:
Practitioners don’t have easy access to the compute resources they need to scale up and out their ML workflows.
﻿
﻿
﻿
Practitioners are either spending too much time managing their own compute environments or coordinating and collaborating with their MLOps engineers to access compute resources and Kubernetes clusters, or worst of all - just waiting around for model training jobs to finish running on a single node on their local machine. 
We’re thrilled to help both ML practitioners and MLOps teams tackle that problem with the introduction of W&B Launch, a workflow connector that will automatically package up code and launch a job into any target environment. Send your ML training job to a single machine with access to better GPUs, or to a cluster like EKS with multiple machines to work on your job in parallel. In short, Launch provides easy access to compute–with none of the complexity. 
﻿
How W&B Launch WorksAll it takes is a simple one-time configuration by your MLOps team to connect Launch into your infrastructure and compute environments, create Queues to each cluster and activate them. These queues can now subsequently be used and reused by ML practitioners to, with just a few simple clicks, dramatically scale up model training. 
With Launch, users can easily launch jobs into connected target environments with additional compute resources, and do it all from within a familiar and easy-to-use W&B interface. Many of the ML practitioners we spoke with didn't have that core familiarity and technical skill set of configuring and managing Kubernetes environments. Launch abstracts away that complexity. 
Launch also provides click-button reproducibility of models by automatically containerizing jobs. These containers provide important environmental snapshots that are especially critical in regulated industries and for organizations with heightened scrutiny on AI governance. Being able to easily reproduce runs–and to go a step further by using Sweeps on Launch, to quickly change and tune hyperparameters–will be another tool to make the daily life of an ML practitioner using W&B much more delightful, allowing practitioners to focus on the things they care about, namely model building and experimentation.
﻿
"Launch greatly simplifies our work optimizing, experimenting with and benchmarking ML methods, letting us focus on reliability and reproducibility of results," said Orlando Avila-Garcia, AI Principal Researcher at ARQUIMEA. 
"This reproducibility is critical to debugging loss spikes and data quality problems," echoed Hanlin Tang, CTO and co founder of MosaicML. 
We also recognize the invaluable role of MLOps in creating and maintaining a smooth, efficient and scalable ML workflow, and aim to make their day-to-day lives easier as well with Launch. Following that one-time infra configuration, MLOps engineers can give their practitioners access without having to configure or provision new queues every time they want to launch a model training job. They can still maintain observability and oversight over the whole process of launching jobs, without being a blocker or slowing down the process. 
Most importantly, MLOps teams can serve a valuable role of providing smart defaults and best practices to practitioners by using queues and infra resources in a smart and managed way. They no longer have to spend time debugging code or infra configurations from users who aren’t necessarily experts in those areas, or building patchwork internal systems for launching jobs and accessing compute. We think Launch will serve to build a bridge of collaboration between ML practitioners and MLOps. 
We invite all users to try out Launch! Connecting to a single node on local machines is available to all W&B users, by toggling a feature flag. Connecting to an external cluster is an enterprise premium feature - reach out to us if you are interested in giving it a test drive! 
﻿
Add a comment