Skip to main content

Recurrent Neural Network Regularization With Keras

A short tutorial teaching how you can use regularization methods for Recurrent Neural Networks (RNNs) in Keras, with a Colab to help you follow along.
Created on March 15|Last edited on March 16

Table of Contents (click to expand)

An Introduction To Regularization For Recurrent Neural Networks

In this report, we'll walk through how you can use regularization methods in Recurrent Neural Networks to help make your model more robust and perform better. A quick review of the various methods:
The Lp\large L^p norm of any vector is given by :
\huge

xp=(i=1nxip)1p\huge \displaystyle || x ||_p = \left( \sum_{i=1}^n |x_i|^p \right)^{\frac{1}{p}}

Using the above definition, we define the L1\large L1 and L2\large L2 regularization as:
L1Lreg=L+λw1L2Lreg=L+λw2\large \begin{array}{ll} L_1 \longrightarrow L_{reg} &= L + \lambda || w ||_1 \\ L_2 \longrightarrow L_{reg} &= L + \lambda || w ||_2 \end{array}

L1\large L1 regularization can be viewed as the addition of the absolute value of the weight vector, while L2\large L2 regularization can be viewed as the addition of the root of the squares of the weight vector.
Lastly, before jumping in, if you'd like to follow along with this piece in a Colab with executable code, you can find that right here:




The Code For Applying Regularization Techniques In Keras

Keras makes it extremely easy to use different regularization techniques in various Recurrent Cells such as a Long Short Term Memory Unit or a Gated Recurrent Unit with the tf.keras API. The TensorFlow API provides various arguments which allow for quick prototyping.
Those are:
  • kernel_regularizer: This argument helps apply regularization to the Kernel Weights Vector.
  • recurrent_regularizer: This argument helps apply regularization to the Recurrent Kernel Weights Vector.
  • bias_regularizer: This argument helps apply regularization to the Bias Vector.
  • activity_regularizer: This argument helps apply regularization to the output of the layer being used.
NB - At the moment, Tensorflow allows for L1, L2 and their combined usage.
💡
Let's see how we can use these in practice!!!
from wandb.keras import WandbCallback

model = keras.Sequential([
layers.LSTM(..., recurrent_regularizer='l1'), # <- Substitute with either 'l1', 'l2' or 'l1_l2'
# ...
])

model.compile(...)

model.fit(
x_train, y_train, ...
callbacks = [WandbCallback()]
)

The Results

Now, that you've seen how to use various regularizations methods, let's see how we can use the Weights & Biases Keras Callback to easily visualize and compare them using Panels. For example, here's a quick comparison of L1\large L1, L2\large L2 and L1+L2\large L1+L2, you'll find linked in the Colab above:

Run set
12

As we can see from the plots, L2\large L2 happens to be the best regularization method. To further improve the metrics, you can try hyperparameter tuning by changing the batch size or the number of units.
Weights & Biases Sweeps makes this incredibly easy by automatically running your pipeline using an agent. For more details please refer to our Sweeps Quickstart Guide.
If you'd like to try this yourself, here's the Colab to do so:


Summary

In this article, you saw how you can implement Regularization in Recurrent Neural Networks using the Keras Framework and how the use of Weights and Biases allows you to easily compare the various types of regularizations. To see the full suite of W&B features please check out this short 5 minutes guide.
If you want more reports covering the math and "from-scratch" code implementations let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.

Iterate on AI agents and models faster. Try Weights & Biases today.