Simple Transformers removes complexity and lets you get down to what matters – model training and experimenting with the Transformer model architectures.
Ayush Thakur

## Introduction

Hugging Face Transformers, an open-source library, is the one place stop for thousands of pre-trained models. The API design is well thought and easy to implement. However, there is still a level of complexity, and some technical know-how is needed to make it work like a charm.

Enter Simple Transformers, removes complexity and lets you get down to what matters – model training and experimenting with the Transformer model architectures. It helps you bypass all the complicated setups, boilerplate code, and all the other general unpleasantness by,

• initializing a model in one line
• training in the next
• and evaluating in the third line.

In this brief report, we will build a sentiment classifier on the IMDB dataset using both Hugging Face and Simple Transformers. We will then do small projects using Simple Transformers.

## The Hugging Face Way

In this section, we will try to build a simple sentiment classifier using Hugging Face APIs. We will use the IMDB dataset to fine-tune a Distil BERT model.

#### Try it out in Google Colab $\rightarrow$

You can build a sentiment classifier with few lines of code using Hugging Face Transformers. But there are some gotchas, specially for those just starting out with Transformers.

#### 1. Imports

Different transformer models need different imports. Hugging Face support two deep learning frameworks - PyTorch and TensorFlow. TF prefix is used with TensorFlow specific imports.

from transformers import DistilBertTokenizerFast
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments


For our sentiment classifier, we will primarily need a tokenizer, a model, and a trainer.

#### 2. Tokenizer

Tokenization is one of the most common pre-processing tasks in NLP. Given a sentence, the task is to chop it up into pieces, called tokens. Even transformers need inputs to be tokenized. However, different transformers require different tokenization modules, and thus one needs to import the right module.

tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')



#### 3. Training Arguments

Hugging Face provides a simple but feature complete training and evaluation interface. Using TrainingArguments or TFTrainingArguments, one can provide a wide range of training options and have built-in features like logging, gradient accumulation, and mixed precision. Learn more about different training arguments here.

training_args = TFTrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=10,
)


#### 4. Get Model

One can download any pre-trained transformer model using simple Hugging Face APIs. However, one still needs to import the correct module.

model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")


#### 5. Train

Trainer or TFTrainer APIs provide the interface to train and evaluate the transformer model on your downstream task.

trainer = TFTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset
)

trainer.train()


Hugging Face comes with the Weights and Biases integration. trainer.train() automatically log all the metrics while training on your W&B dashboard. Below are the charts logged while training our sentiment classifier.

## The Simple Transformers Way

Simple Transformers avoids all the complexity that we saw in the Hugging Face section. Simple Transformers provide even more abstraction to Hugging Face APIs to train and evaluate Transformer models quickly. For all downstream fine-tuning tasks, only three lines of code are required. Let us build a sentiment classifier using Simple Transformers.

#### Fine-tune Simple Transformer on Google Colab $\rightarrow$

Know how Simple Transformers is meant to be beginners friendly in this post by the creator of Simple Transformers Thilina Rajapakse.

#### 1. Imports

Using Simple Transformers is as easy as one line of import. For each downstream task, there is one module that is to be imported. For example, the import shown in the code snippet is all you need for text classification.

from simpletransformers.classification import ClassificationModel


#### 2. Training Arguments

You just need a dictionary, train_args, to provide training arguments. Check out all the available arguments here.

train_args={
'num_train_epochs': 3,
'train_batch_size': 16,
'eval_batch_size': 64,
'warmup_steps': 500,
'weight_decay': 0.01,
'logging_steps': 10,
'learning_rate': 5e-5,
'fp16': False,
'wandb_project': 'gallery',
"wandb_kwargs": {'entity': 'wandb'}
}


All we need is to initialize the task-specific model. Since we are doing classification here, we need to initialize ClassificationModel. Unlike Hugging Face, where we need to import the correct module to use a pre-trained model, with Simple Transformers, we simply need to pass in the model's name as an argument. We can pass in the train_args. Note there was no need to initialize a tokenizer. Simple Transformers apply the correct tokenization automatically. However, you can specify the name of the tokenizer as well.

model = ClassificationModel('distilbert', 'distilbert-base-uncased', use_cuda=True, cuda_device=0, args=train_args)


#### 4. Train

Training the model is as simple as passing your dataset to model.train_model. It first applies the appropriate tokenization and then trains your model. Simple Transformers comes with Weights and Biases integration, which will log all your training metrics automatically.

model.train_model(train_df)


The result of the training is shown below.

## Taking SimpleTransformers For A Test Drive

In this simple project, we will build a multi-class classification model using Simple Transformers. We will be using this dataset, originally posted in this blog post, which contains several thousand programming questions posted on Stack Overflow. Each of these questions has exactly one tag(Python, CSharp, JavaScript, or Java). The task is to classify the input question into a tag.

#### Try out the experiment on Google Colab $\rightarrow$

Since it's a multi-class classification problem let's initialize our model to reflect the same. Note the num_labels argument.

# labels
LABELS = ['csharp', 'java', 'javascript', 'python']
# initialize model
model = ClassificationModel('distilbert', 'distilbert-base-cased', num_labels=4, use_cuda=True, cuda_device=0, args=train_args)


Training the model is as easy as calling model.train_model(). The result of the training is shown below.

## Final words

There are many more NLP tasks for which Simple Transformers has built-in support. Some of them are:

• Token Classification