Using AWS Sagemaker and Weights & Biases Together on Digit Recognition with MNIST

We've received a few questions about how W&B works with Amazon Sagemaker. Here's a quick tutorial on a simple dataset to get your started.
Costa Huang


In this tutorial, we'll showcase how to easily integrate Weights & Biases to track experiments orchestrated with AWS SageMaker. The source code of this tutorial can be found at If you're unfamiliar with either, here's a quick introduction before we get going:
SageMaker is a comprehensive machine learning service. It is a solid tool that helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models by providing a rich set of tools and features.
Weights & Biases helps augment and the machine learning workflow by bringing experiment tracking, dataset versioning and project collaboration to a new level.

Integrating Weights & Biases to AWS SageMaker

Integrating W&B is super easy. You need to add as little as 5 lines of code to the training code to track experiments. For example:
import wandbwandb.init(project='gpt3') # 1. Start a W&B runwandb.config.learning_rate = 0.01 # 2. Save model inputs and hyperparameters# ... Model training code herewandb.log({"loss": loss}) # 3. Log metrics over time to visualize performance
To showcase how easy this integration is, we will take the official AWS SageMaker example on training a MNIST model with Pytorch and make a few small modifications.
1) Adding Weights & Biases to the Training code
We'll be adding a few lines of code to incorporate Weights & Biases for experiment tracking. In the panel below, the left side shows the original training code from AWS SageMaker example repo while the right shows 8 lines of code we added to incorporate W&B to do experiment tracking.
2) Pass Weights & Biases API Key to the Estimator
Go to to signup for a free account, after which you could go to to get your API key!
You should see something like the screenshot above. Simply copy the API key and paste it in the following code block.
estimator = PyTorch(entry_point='', source_dir="src", role=role, py_version='py3', framework_version='1.8.0', instance_count=1, instance_type='ml.c5.2xlarge', hyperparameters={ 'epochs': 1, 'backend': 'gloo' }, # Pass Weights & Biases API Key as an environment variable # You can find the key by going to environment={"WANDB_API_KEY": current_api_key})
First, run the following to train the estimator:{'training': inputs})
Then, you'll see your W&B run in your dashboard. From here, you can check out a variety of metrics such as losses, system metrics, and gradient histograms:

Case Study: Visualizing Dataset and Predictions

While seeing the metrics above is definitely helpful, sometimes we want to see exactly how our model is making predictions. And that's where W&B Tables come in handy! What are Tables exactly?
W&B Tables is a tool to help understand your datasets and visualize model predictions. Specifically, a W&B Table (wandb.Table) is a two dimensional grid of data where each column has a single type of data—think of this as a more powerful DataFrame. Tables support primitive and numeric types, as well as nested lists, dictionaries, and rich media types. Log a Table to W&B, then query, compare, and analyze results in the UI.
So in our use case, we could use the W&B Table to visualize predictions. In the panel below, we added the Table code on the right-hand side.
As a result, we can visualize the Table below. It tells us the label, prediction, and the probability of our predictions. For example, our model is 99.82% confident that the first image is a 7.

Case Study: Learning Rate's Effect on Performance

Now that we have our simple, initial experiment finished, it’s time to do something more interesting. We are going to experiment with five learning rates and see their effects on performance.
The code is fairly straight-forward. We just need to create multiple estimators with different learning rate, and run the estimator in a non-blocking manner via wait=False.
lrs = [0.01, 0.001, 0.05, 0.1, 0.2]for lr in lrs: estimator = PyTorch(entry_point='', source_dir="src", role=role, py_version='py3', framework_version='1.8.0', instance_count=1, instance_type='ml.c5.2xlarge', hyperparameters={ 'epochs': 10, 'backend': 'gloo', 'lr': lr, }, environment={"WANDB_API_KEY": current_api_key}){'training': inputs}, wait=False)
The script above should take less than 10 seconds to finish, and in the training jobs section of the SageMaker panel, you should see jobs being spun up.
To do the analysis, we simply group the experiments by the learning rate config lr, create the panels on the training/loss and testing/loss. Immediately, we see insights on the experiments: having too large of a learning rate (i.e. lr=0.2) or too small of a learning rate (i.e. lr=0.001) hurts the testing/loss.


Wee hope you enjoyed our ride! These case studies are of course just some example insights such as understanding other hyperparameters what classes your model is struggling with. The point is, since W&B logs every important metric during your model training runs, you'll be able to analyze your performance and iterate towards better models more quickly.
While SageMaker can be a great tool for deploying and building models, W&B track your experiments better by helping track your datasets, visualize findings, and unlock collaboration by letting anyone on your team see your experiments and build on top of them.
We've always prioritized making W&B work well with different frameworks and services. SageMaker is no exception. Whatever models you're training and deploying on SageMaker, chances are W&B can help them get better, faster.