Skip to main content

Text Classification With AWS SageMaker, HuggingFace, and W&B

This article provides a tutorial on running exploratory data analysis and natural language processing experiments using AWS SageMaker, HuggingFace, and W&B.
Created on August 15|Last edited on December 9
This article gives a quick introduction to how to use SageMaker, HuggingFace, and Weights & Biases in tandem to run EDA and NLP experiments.

Specifically, we'll look at the Banking77 dataset, which has more than 13,000 customer service queries with 77 intents (hence the dataset name). We will:
  • Explore the Banking77 dataset using W&B Tables
  • Analyze the results of a hyperparameter tuning experiment run on AWS SageMaker, and
  • Display key training and evaluation metrics such as train loss and evaluation accuracy

👉 Click here to see the SageMaker Notebook and training script

👉 Click here to see a live W&B dashboard from this project


Let's get started!


Table of Contents



Explore the Banking77 Dataset

First, let's dig into the dataset with W&B Tables. Tables lets you see your data. That means you can interact, sort, and evaluate your data, uncover the relationships within, and understand your data better than in more static formats.
Here's a table sorted only by a simple id:

Run set
22

This is just a quick example of what you can do with Tables. This feature was made to be dynamic and customizable. If you'd like to dig in a little more, we recommend the following reports on Tables, with special attention to the Shakespeare report if you'd like more ideas about what you can do with text data.


Hyperparameter Tuning on AWS SageMaker

Next, let's look at the results of a hyperparameter tuning experiment run on AWS SageMaker.
Here, we fine-tuned the HuggingFace models and logged the results with the Weights & Biases HuggingFace Trainer integration. We carried out hyperparameter searches over:
  • Model
  • Warmup steps
  • Learning rate
Note: This section of the report is filtered down to runs with Job Type = "HyperparameterTuning"
💡

Parameter Importance

Let's look first at this plot that shows the importance of each hyperparameter. Hovering over any line will show you the specific values of that particular sweep.

Run set
20
Run set 2
27

Our Sweeps tool is a great way to understand what parameters actually matter to your models performance. Let's move onto one more W&B feature quickly.

Model Lineage

It's also important to understand the dependencies and lineage of your data and your models. W&B Artifacts does just that. Below, you'll see the lineage of each model's weights, from raw dataset to processed dataset to train-eval split and finally to the final model weights.

Artifacts, of course, supports larger, more complex flows and allow you to collapse and expand steps for ease of viewing and understanding.

Conclusion

We spend a lot of time ensuring our tools play well with others and SageMaker and HuggingFace are no exception. This article was meant to give you a fairly quick understanding of what W&B can do working alongside both but whatever your infrastructure — we've got you covered.
Graham Whitelaw
Graham Whitelaw •  
Weave: (empty)
Chad take a look
Reply
Jack Bailin
Jack Bailin •  
Parallel Coordinates Plot
Interesting results.
Reply
Graham Whitelaw
Graham Whitelaw •  
Markdown Panel
this is great
Reply
Jack Bailin
Jack Bailin •  
Parallel Coordinates Plot
Interesting.
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.