Skip to main content

How to Train ML Models with SageMaker Studio Lab, a Powerful Jupyter-based ML Platform

How to train your ML models using this new AWS tool.
Created on March 16|Last edited on June 30

Amazon Sagemaker Studio Lab (SMSL) is the new web-based platform for machine learning practitioners. It is a free machine learning (ML) development environment that provides the compute, storage, and security—all at no cost—for anyone to learn and experiment with ML. Similar to Google Colab, it provides a familiar Jupyter-based web interface that we're all so used to and comfortable with.
No AWS account is needed, nor any cloud infrastructure skills. To get started, simply request an account with a valid email address. SMSL provides a familiar Jupyter-based web interface that we are so used to, but there are some key differentiators:
  • SMSL provides a full JupyterLab instance, with the standard shortcuts, widgets, extensions, git support, integration with GitHub, and python environments.
  • A single SMSL user session can leverage either 12 hours of CPU or 4 hours of GPU resources. There are no limited user sessions per customer account.
  • The environment is persistent and you get 15GB of storage, so when you come back to work you can start where you left off.
Once you log in to SMSL, you will be on your default project page, where you can select a compute type and “Start runtime.” We recommend starting with CPU instances and leveraging GPU instances only as needed.
A sister blog post on SageMaker Studio (not Lab) is available on the AWS ML blog.
💡

Importing an external project/repo from GitHub

We are interested in running our own code, let's import some notebooks from GitHub.
  1. We will use this repo as an example, so let's load the notebook that lives in:
To do this, you have to prepend the URL...
https://studiolab.sagemaker.aws/import/github
...to the path of the notebook you are trying to run. In my case it looks like thiw:
tcapelle/aws_smsl_demo/blob/main/01_data_processing.ipynb
NB: You can make your life easier by adding a "Open Studio Lab banner" on the top of the notebook by adding a markdown cell with: [![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/tcapelle/aws_smsl_demo/blob/main/01_data_processing.ipynb)
2. Once the notebook is rendered on the SMSL website, you will be asked to Copy to Project

When you do this, as the notebooks lives on a GitHub repo, SMSL will ask you if you want to clone the whole repo in your workspace. I found very useful to clone the full repo, and as my repo contained an environment.yml file inside. It also asked me to create it for me, which is a nice, thoughtful feature. 😊

💡 You may get a warning that your browser is blocking the pop-up page (SMSL is trying to launch the JupyterLab website for you 🪐).
🧐 SMSL will call the command conda env create -f environment.yml for you automatically

Quick look at conda evnironment.yml files

As per conda documentation, we have to specify the libraries that we want to live inside our environment in a YAML file. This is a nice way of making sure your code is runnable elsewhere. You can call this file whatever you like (`my_env.yml`, requirements.yml, etc.) but if you respect the convention of environment.yml naming, SMSL will pickup it automatically.
Let's take a quick look at what the file looks like:
name: wandb
channels:
- pytorch
- conda-forge
dependencies:
- torchvision>=0.8
- matplotlib
- numpy
- pandas>=1.0.0
- scikit-learn
- pytorch>=1.7.0
- wandb
- ipykernel
- tqdm
- ipywidgets
🚀 It's recommended that you put the dependency on ipykernel so you are able to choose the environemt inside jupyterlab whenever you launch a notebook.
When you open a notebook, it will ask you what environment you want to use to run the python kernel


The Terminal ﹩>

One of the coolest aspects about this type of workspace, is that you get full access to the underlying VM via a terminal. Just click on the ✚ sign and create a new terminal. You can actually manage your conda environments from here and install other dependencies that you may need.


Some useful commands are:
  • If you run out of space, you're done! So use $ df -h to know if you are running out of space–I am at 98% in /home/studio-lab-user 😱

  • conda commands in general are useful. But environments take a lot of space, so try deleting some of them:
    • conda env list to get a list of all environments
    • conda env remove --name="pepito" to remove an environment named pepito
    • conda env create ... to create new envs
  • This is a Ubuntu/debian machine, so apt-get is your friend to get programs

💡 One More JupyterLab 🪐 tip

You can git diff notebooks without issues using the integrated jupyter-git pane


Using Weights & Biases in SMSL 🪐

Weights & Biases (wandb) is just a regular python library. Once installed, it's as simple as adding a couple lines of code to your training script and you will be logging experiments. You can install it manually by doing:
$ pip install wandb
Or putting wandb as a requirement on your environment.yml file.

Intro notebook

To start using Weights & Biases (wandb) straight away inside your SMSL workspace, just click on the banner below:

Copy the notebook to the project and install the necessary dependencies (this is a PyTorch classification example).

Case Study: Semantic Segmentation for Autonomous Vehicles

We will use this repo to create a model to perform semantic segmentation on the Cambridge-driving Labeled Video Database (or CamVid) dataset to train our model. You can click here to copy the repo to your SMSL workspace.

The dataset

We use the Cambridge-driving Labeled Video Database (CamVid) for this example. It contains a collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. We can version our dataset as a wandb.Artifact so we can reference it later. See the following code:
with wandb.init(project="sagemaker_camvid_demo", job_type="upload"):
artifact = wandb.Artifact(
name='camvid-dataset',
type='dataset',
metadata={
"url": 'https://s3.amazonaws.com/fast-ai-imagelocal/camvid.tgz',
"class_labels": class_labels
},
description="The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes."
)
artifact.add_dir(path)
wandb.log_artifact(artifact)
Follow along in 01_data_processing.ipynb
💡
We will also log a wandb.Table version to have access to an interactive visualization of the data. You can use tables to understand your datasets, visualize model predictions, and share insights in a central dashboard. W&B Tables support many rich media formats, like image, audio, and waveforms. For a full list of media formats, refer to Data Types.



Training a model

We can now create a model and train it. We will use PyTorch and fastai to quickly prototype a baseline and then use wandb.Sweeps to explore a better model.
💡
The model is supposed to learn a per-pixel annotation of a scene captured from the point of view of the autonomous agent. The model needs to categorize or segment each pixel of a given scene into 32 relevant categories such as road, pedestrian, sidewalk, cars etc. like listed below. You can click on any of the segmented images on the table shown above and access this interactive interface for accessing the segmentation result and categories.

For the baseline experiments we decided to use a simple architecture inspired by the UNet with different backbones from timm. We performed the experiments with Focal Loss. We attach a brief summary of our experiments with the baseline models and the loss functions:
We will need a GPU backend for this notebooks, we can check using nvidia-smi command.
💡

Experiments
12


Visualizing Model Outputs

Weight & Biases really shines for assessing the model performance, we can use the power of wandb.Tables to visualize where our model is doing poorly. Here, we see the model predictions along the ground truth and the class IOU score. We can filter and sort to see where the model is failing to detect vulnerable pedestrians🚶‍♀️ and bicycles 🚲



Hyperparameter Optimisation with wandb.Sweeps

In order to improve the performance of the baseline model, we need to not only select the best model, but also the best set of hyperparameters to train it with. This, in-spite of being quite a daunting task, was actually made easy for us by Sweeps.
We perform a Bayesian hyperparameter search with the goal to maximize the foreground accuracy of the model on the validation dataset. To perform this, we define the following configuration file sweep.yaml. Inside this file we define the method to use: bayes and the parameters and their corresponding values to search; we will also try different backbones, batch sizes and loss functions. We also explore different optimization parameters: learning rate and weight decay that we sample from a distribution.
# sweep.yaml
program: train.py
project: sagemaker_camvid_demo
method: bayes
metric:
name: foreground_acc
goal: maximize
early_terminate:
type: hyperband
min_iter: 5
parameters:
backbone:
values: ["mobilenetv2_100","mobilenetv3_small_050","mobilenetv3_large_100","resnet18","resnet34","resnet50","vgg19"]
batch_size:
values: [8, 16]
image_resize_factor:
value: 4
loss_function:
values: ["categorical_cross_entropy", "focal", "dice"]
learning_rate:
distribution: uniform
min: 1e-5
max: 1e-2
weight_decay:
distribution: uniform
min: 0.0
max: 0.05

Afterwards, on a terminal you will launch the sweep using the wandb command line
$ wandb sweep sweep.yaml —-project="sagemaker_camvid_demo"
And then launch a sweep agent on this machine by doing:
$ wandb agent <sweep_id>
Once the sweep has finished, we can use a parallel coordinates plot to explore the performances of the models with various backbones and different sets of hyperparameters, and based on that we can see which model performs the best.


Run set
56

We can derive the following key insights from the sweep:
  • Lower learning rate and lower weight decay results in better foreground accuracy and Dice scores.
  • Batch size has strong positive correlations with the metrics.
  • The VGG-based backbones might not be a good option to train our final model because they’re prone to resulting in a vanishing gradient. (They’re filtered out as the loss diverged.)
  • The ResNet backbones result in the best overall performance with respect to the metrics.
  • The ResNet34 or ResNet50 backbone should be chosen for the final model due to their strong performance in terms of metrics.

Conclusion

We hope you enjoyed this quick introduction to SageMaker Studio Lab and you can leverage it alongside W&B to track your machine learning experiments. If you have any questions about how these tools work together, toss them into the comments and we'll make sure we get to them. Thanks for reading!

Iterate on AI agents and models faster. Try Weights & Biases today.