Early Stopping in HuggingFace - Examples

Fine tune your HuggingFace Transformer with Early Stopping regularization. Made by Ayush Thakur using Weights & Biases
Ayush Thakur

Introduction

In this report, we'll see examples to use early stopping regularization to fine-tune your HuggingFace Transformer. We will cover the use of early stopping with native PyTorch and TensorFlow workflow alongside HuggingFace's Trainer API.

early-stopping-en.png

Native TensorFlow

Fine-tune HuggingFace Transformer using TF in Colab $\rightarrow$

If you are using TensorFlow(Keras) to fine-tune a HuggingFace Transformer, adding early stopping is very straightforward with tf.keras.callbacks.EarlyStopping callback. It takes in the name of the metric that you will monitor and the number of epochs after which training will be stopped if there is no improvement.

early_stopper = tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                                 patience=5, 
                                                 restore_best_weights=True)

Here early_stopper is the callback that can be used with model.fit.

model.fit(trainloader, 
          epochs=10, 
          validation_data=validloader,
          callbacks=[early_stopper])

Observations

Native PyTorch

Fine-tune HuggingFace Transformer using PyTorch in Colab $\rightarrow$

Native PyTorch does not have an off-the-shelf early stopping method. But if you are fine-tuning your HuggingFace Transformer using native PyTorch here's a GitHub Gist that provides a working early stopping hook.

class EarlyStopping(object):
    def __init__(self, mode='min', min_delta=0, patience=10, percentage=False):
        self.mode = mode
        self.min_delta = min_delta
        self.patience = patience
        self.best = None
        self.num_bad_epochs = 0
        self.is_better = None
        self._init_is_better(mode, min_delta, percentage)

        if patience == 0:
            self.is_better = lambda a, b: True
            self.step = lambda a: False

    def step(self, metrics):
        if self.best is None:
            self.best = metrics
            return False

        if np.isnan(metrics):
            return True

        if self.is_better(metrics, self.best):
            self.num_bad_epochs = 0
            self.best = metrics
        else:
            self.num_bad_epochs += 1

        if self.num_bad_epochs >= self.patience:
            print('terminating because of early stopping!')
            return True

        return False

    def _init_is_better(self, mode, min_delta, percentage):
        if mode not in {'min', 'max'}:
            raise ValueError('mode ' + mode + ' is unknown!')
        if not percentage:
            if mode == 'min':
                self.is_better = lambda a, best: a < best - min_delta
            if mode == 'max':
                self.is_better = lambda a, best: a > best + min_delta
        else:
            if mode == 'min':
                self.is_better = lambda a, best: a < best - (
                            best * min_delta / 100)
            if mode == 'max':
                self.is_better = lambda a, best: a > best + (
                            best * min_delta / 100)


To use early stopping in your training loop check out the Colab notebook linked above.

es = EarlyStopping(patience=5)

num_epochs = 100
for epoch in range(num_epochs):
      train_one_epoch(model, data_loader)  # train the model for one epoch.
      metric = eval(model, data_loader_dev)  # evalution on dev set.
      if es.step(metric):
          break  # early stop criterion is met, we can stop now

I highly recommend reorganizing your PyTorch code using PyTorch Lightning. It provides early stopping and many other techniques off-the-shelf. If you are not familiar with PyTorch Lightning here are some reports that will get you started: