Text Classification with AWS SageMaker, Hugging Face, and W&B
A tutorial on running EDA and NLP experiments using AWS SageMaker, Hugging Face, and Weights & Biases. Made by Morgan using Weights & Biases
SageMaker, Hugging Face and W&B
This report is a quick introduction on how to use SageMaker, Hugging Face, and W&B in tandem to run EDA and NLP experiments. Specifically, we'll be looking at the Banking77 dataset, a dataset of more than 13,000 customer service queries with 77 intents (hence the dataset name). We will:
Explore the Banking77 dataset using W&B Tables
Analyze the results of a hyperparameter tuning experiment run on AWS SageMaker, and
Display key training and evaluation metrics such as train loss and evaluation accuracy
👉 Click here to see the SageMaker Notebook and training script
👉 Click here to see a live W&B dashboard from this project
Let's get started!
Explore the Banking77 Dataset
First, let's dig into the dataset with W&B Tables. Tables lets you actually see you data. That means you can interact, sort, and evaluate you data, uncover the relationships within, and simply understand your data better than you can in more static formats. Here's a table sorted only by a simple id:
This is obviously just a quick example of what you can with Tables. This feature was made to be dynamic and customizable and, if you'd like to dig in a little more, we recommend the following reports on Tables, with special attention to the Shakespeare report if you'd like more ideas about what you can do with text data.
Hyperparameter Tuning on AWS SageMaker
Next, let's look at the results of a hyperparameter tuning experiment run on AWS SageMaker.
Here, we fine-tuned the Hugging Face models and logged the results with the Weights & Biases Hugging Face Trainer integration. We carried out hyperparameter searches over:
Note: This section of the report is filtered down to runs with Job Type = "HyperparameterTuning"
Let's look first at this plot that shows the importance of each hyperparameter. Hovering over any line will show you the specific values of that particular sweep.
Our Sweeps tool is a great way to understand what parameters actually matter to your models performance. Let's move onto one more W&B feature quickly.
It's also important to understand the dependencies and lineage of your data and your models. W&B Artifacts does just that. Below, you'll see the lineage of each model's weights, from raw dataset to processed dataset to train-eval split and finally to the final model weights.
Artifacts of course supports larger, more complex flows and allows you collapse and expand steps for ease of viewing and understanding.
We spend a lot of time making sure our tools play well with others and SageMaker and Hugging Face are no exception. This report is meant to give you a fairly quick understanding of what W&B can do working alongside both but whatever your infrastructure, we've got you covered.