Text Classification With AWS SageMaker, HuggingFace, and W&B

This article provides a tutorial on running exploratory data analysis and natural language processing experiments using AWS SageMaker, HuggingFace, and W&B.

Morgan

Created on August 15|Last edited on December 9

Comment

This article gives a quick introduction to how to use SageMaker, HuggingFace, and Weights & Biases in tandem to run EDA and NLP experiments. 
﻿
Specifically, we'll look at the Banking77 dataset, which has more than 13,000 customer service queries with 77 intents (hence the dataset name). We will: 
 Explore the Banking77 dataset using W&B Tables
Analyze the results of a hyperparameter tuning experiment run on AWS SageMaker, and
Display key training and evaluation metrics such as train loss and evaluation accuracy
👉 Click here to see the SageMaker Notebook and training script 
👉 Click here to see a live W&B dashboard  from this project﻿
Let's get started!
﻿
Table of ContentsExplore the Banking77 DatasetHyperparameter Tuning on AWS SageMakerModel LineageConclusion
﻿
Explore the Banking77 DatasetFirst, let's dig into the dataset with W&B Tables. Tables lets you see your data. That means you can interact, sort, and evaluate your data, uncover the relationships within, and understand your data better than in more static formats. 
Here's a table sorted only by a simple id:
﻿
Run set22
﻿
This is just a quick example of what you can do with Tables. This feature was made to be dynamic and customizable. If you'd like to dig in a little more, we recommend the following reports on Tables, with special attention to the Shakespeare report if you'd like more ideas about what you can do with text data. 
Tables Tutorial: Visualize Text Data & Predictions 
A guide on how to log and organize text data and language model predictions with our old friend William Shakespeare
Announcing W&B Tables: Iterate on Your Data
Today, we're excited to launch W&B Tables, a new tool for data iteration and model evaluation. Here's how it works:
﻿
Hyperparameter Tuning on AWS SageMakerNext, let's look at the results of a hyperparameter tuning experiment run on AWS SageMaker. 
Here, we fine-tuned the HuggingFace models and logged the results with the Weights & Biases HuggingFace Trainer integration. We carried out hyperparameter searches over:
Model
Warmup steps
Learning rate
 Note: This section of the report is filtered down to runs with Job Type = "HyperparameterTuning"
💡
Parameter ImportanceLet's look first at this plot that shows the importance of each hyperparameter. Hovering over any line will show you the specific values of that particular sweep. 
﻿
Run set20
Run set 227
﻿
Our Sweeps tool is a great way to understand what parameters actually matter to your models performance. Let's move onto one more W&B feature quickly. 
Model LineageIt's also important to understand the dependencies and lineage of your data and your models. W&B Artifacts does just that. Below, you'll see the lineage of each model's weights, from raw dataset to processed dataset to train-eval split and finally to the final model weights.
﻿
Artifacts, of course, supports larger, more complex flows and allow you to collapse and expand steps for ease of viewing and understanding. 
ConclusionWe spend a lot of time ensuring our tools play well with others and SageMaker and HuggingFace are no exception. This article was meant to give you a fairly quick understanding of what W&B can do working alongside both but whatever your infrastructure — we've got you covered. 
﻿