Skip to main content

SimpleTransformers: Transformers Made Easy

This article looks at SimpleTransformers, which removes the complexity and lets you get down to what matters – model training and experimenting with the Transformer model architectures.
Created on September 17|Last edited on November 30
HuggingFace Transformers, an open-source library, is the one-stop shop for thousands of pre-trained models. The API design is well thought out and easy to implement. However, there is still a level of complexity, and some technical know-how is needed to make it work like a charm.
Enter SimpleTransformers, removes complexity and lets you get down to what matters – model training and experimenting with the Transformer model architectures. It helps you bypass all the complicated setups, boilerplate code, and all the other general unpleasantness by,
  • initializing a model in one line
  • training in the next
  • and evaluating in the third line.
In this article, we will build a sentiment classifier on the IMDB dataset using both HuggingFace and SimpleTransformers. We will then do small projects using Simple Transformers.

Table of Contents




The HuggingFace Way

In this section, we will try to build a simple sentiment classifier using HuggingFace APIs. We will use the IMDB dataset to fine-tune a Distil BERT model.

Try it out in Google Colab \rightarrow

You can build a sentiment classifier with a few lines of code using HuggingFace Transformers. But there are some gotchas, especially for those just starting out with Transformers.

1. Imports

Different transformer models need different imports. HuggingFace support two deep learning frameworks - PyTorch and TensorFlow. TF prefix is used with TensorFlow specific imports.
from transformers import DistilBertTokenizerFast
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments
For our sentiment classifier, we will primarily need a tokenizer, a model, and a trainer.

2. Tokenizer

Tokenization is one of the most common pre-processing tasks in NLP. Given a sentence, the task is to chop it up into pieces, called tokens. Even transformers need inputs to be tokenized. However, different transformers require different tokenization modules, and thus one needs to import the right module.
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')

train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)

3. Training Arguments

HuggingFace provides a simple but feature complete training and evaluation interface. Using TrainingArguments or TFTrainingArguments, one can provide a wide range of training options and have built-in features like logging, gradient accumulation, and mixed precision. Learn more about different training arguments here.
training_args = TFTrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
)

4. Get Model

One can download any pre-trained transformer model using simple HuggingFace APIs. However, one still needs to import the correct module.
model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")

5. Train

Trainer or TFTrainer APIs provide the interface to train and evaluate the transformer model on your downstream task.
trainer = TFTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset
)

trainer.train()
HuggingFace comes with the Weights & Biases integration. trainer.train() automatically log all the metrics while training on your W&B dashboard. Below are the charts logged while training our sentiment classifier.



Run set
1



The Simple Transformers Way

Simple Transformers avoids all the complexity that we saw in the HuggingFace section. Simple Transformers provide even more abstraction to HuggingFace APIs to train and evaluate Transformer models quickly. For all downstream fine-tuning tasks, only three lines of code are required. Let us build a sentiment classifier using Simple Transformers.

Fine-tune Simple Transformer on Google Colab \rightarrow

Know how Simple Transformers is meant to be beginners friendly in this post by the creator of Simple Transformers Thilina Rajapakse.

1. Imports

Using Simple Transformers is as easy as one line of import. For each downstream task, there is one module that is to be imported. For example, the import shown in the code snippet is all you need for text classification.
from simpletransformers.classification import ClassificationModel

2. Training Arguments

You just need a dictionary, train_args, to provide training arguments. Check out all the available arguments here.
train_args={
'num_train_epochs': 3,
'train_batch_size': 16,
'eval_batch_size': 64,
'warmup_steps': 500,
'weight_decay': 0.01,
'logging_steps': 10,
'learning_rate': 5e-5,
'fp16': False,
'wandb_project': 'gallery',
"wandb_kwargs": {'entity': 'wandb'}
}

3. Initialize Task-Specific Model

All we need is to initialize the task-specific model. Since we are doing classification here, we need to initialize ClassificationModel. Unlike HuggingFace, where we need to import the correct module to use a pre-trained model, with Simple Transformers, we simply need to pass in the model's name as an argument. We can pass in the train_args. Note there was no need to initialize a tokenizer. Simple Transformers apply the correct tokenization automatically. However, you can specify the name of the tokenizer as well.
model = ClassificationModel('distilbert', 'distilbert-base-uncased', use_cuda=True, cuda_device=0, args=train_args)

4. Train

Training the model is as simple as passing your dataset to model.train_model. It first applies the appropriate tokenization and then trains your model. Simple Transformers comes with Weights & Biases integration, which will log all your training metrics automatically.
model.train_model(train_df)
The result of the training is shown below.



Run set
2



Taking SimpleTransformers For A Test Drive

In this simple project, we will build a multi-class classification model using Simple Transformers. We will be using this dataset, originally posted in this blog post, which contains several thousand programming questions posted on Stack Overflow. Each of these questions has exactly one tag(Python, CSharp, JavaScript, or Java). The task is to classify the input question into a tag.

Try out the experiment on Google Colab \rightarrow

Since it's a multi-class classification problem let's initialize our model to reflect the same. Note the num_labels argument.
# labels
LABELS = ['csharp', 'java', 'javascript', 'python']
# initialize model
model = ClassificationModel('distilbert', 'distilbert-base-cased', num_labels=4, use_cuda=True, cuda_device=0, args=train_args)
Training the model is as easy as calling model.train_model(). The result of the training is shown below.



Run set
2



Final words

There are many more NLP tasks for which Simple Transformers has built-in support. Some of them are:
  • Token Classification
  • Question Answering
  • Language Modeling
  • Language Generation
Check out Using SimpleTransformers on common NLP applications by Ayush Chaurasia to learn more about what Simple Transformers can do.
I hope you enjoyed this short tutorial on using transformers using Simple Transformers.
Have you tried Simple Transformers before? What has your experience been like? Share it in the comments!

Hugo Sonnery
Hugo Sonnery •  
Training loss
1 reply
Iterate on AI agents and models faster. Try Weights & Biases today.