Recurrent Neural Network Regularization With Keras

A short tutorial teaching how you can use regularization methods for Recurrent Neural Networks (RNNs) in Keras, with a Colab to help you follow along.
Saurav Maheshkar
Created on March 15|Last edited on March 16
Comment
﻿
Table of Contents (click to expand)
An Introduction To Regularization For Recurrent Neural NetworksIn this report, we'll walk through how you can use regularization methods in Recurrent Neural Networks to help make your model  more robust and perform better. A quick review of the various methods:
The Lp\large L^pLp﻿ norm of any vector is given by :
\huge ﻿
∣∣x∣∣p=(∑i=1n∣xi∣p)1p\huge \displaystyle || x ||_p = \left( \sum_{i=1}^n 
 |x_i|^p \right)^{\frac{1}{p}}∣∣x∣∣p​=​i=1∑n​∣xi​∣p​p1​﻿
Using the above definition, we define the L1\large L1L1﻿ and L2\large L2L2﻿ regularization as:
L1⟶Lreg=L+λ∣∣w∣∣1L2⟶Lreg=L+λ∣∣w∣∣2\large \begin{array}{ll} L_1 \longrightarrow
L_{reg} &= L + \lambda || w ||_1 \\

 L_2 \longrightarrow
L_{reg} &= L + \lambda || w ||_2
\end{array}L1​⟶Lreg​L2​⟶Lreg​​=L+λ∣∣w∣∣1​=L+λ∣∣w∣∣2​​﻿
﻿L1\large L1L1﻿ regularization can be viewed as the addition of the absolute value of the weight vector, while L2\large L2L2﻿ regularization can be viewed as the addition of the root of the squares of the weight vector.
Lastly, before jumping in, if you'd like to follow along with this piece in a Colab with executable code, you can find that right here:
﻿
﻿
﻿
The Code For Applying Regularization Techniques In KerasKeras makes it extremely easy to use different regularization techniques in various Recurrent Cells such as a Long Short Term Memory Unit or a Gated Recurrent Unit with the tf.keras API. The TensorFlow API provides various arguments which allow for quick prototyping.
Those are:
kernel_regularizer: This argument helps apply regularization to the Kernel Weights Vector.
recurrent_regularizer: This argument helps apply regularization to the Recurrent Kernel Weights Vector.
bias_regularizer: This argument helps apply regularization to the Bias Vector.
activity_regularizer: This argument helps apply regularization to the output of the layer being used.
NB - At the moment, Tensorflow allows for L1, L2 and their combined usage.
💡
Let's see how we can use these in practice!!!
from wandb.keras import WandbCallback
﻿
model = keras.Sequential([
          layers.LSTM(..., recurrent_regularizer='l1'), # <- Substitute with either 'l1', 'l2' or 'l1_l2'
          # ...
  ])
﻿
model.compile(...)
﻿
model.fit(
    x_train, y_train, ...
    callbacks = [WandbCallback()]
)
The ResultsNow, that you've seen how to use various regularizations methods, let's see how we can use the Weights & Biases Keras Callback to easily visualize and compare them using Panels. For example, here's a quick comparison of L1\large L1L1﻿, L2\large L2L2﻿ and L1+L2\large L1+L2L1+L2﻿,  you'll find linked in the Colab above: 
﻿
Run set12
﻿
As we can see from the plots, L2\large L2L2﻿ happens to be the best regularization method. To further improve the metrics, you can try hyperparameter tuning by changing the batch size or the number of units.
Weights & Biases Sweeps makes this incredibly easy by automatically running your pipeline using an agent. For more details please refer to our Sweeps Quickstart Guide. 
If you'd like to try this yourself, here's the Colab to do so:
﻿
SummaryIn this article, you saw how you can implement Regularization in Recurrent Neural Networks using the Keras Framework and how the use of Weights and Biases allows you to easily compare the various types of regularizations.  To see the full suite of W&B features please check out this short 5 minutes guide.
If you want more reports covering the math and "from-scratch" code implementations let us know in the comments down below or on our forum ✨!
Check out these other reports on Fully Connected covering other fundamental development topics like GPU Utilization and Saving Models.
Recommended Reading
How To Use GPU with PyTorch 
A short tutorial on using GPUs for your deep learning models with PyTorch, from checking availability to visualizing usable.
PyTorch Dropout for regularization - tutorial 
Learn how to regularize your PyTorch model with Dropout, complete with a code tutorial and interactive visualizations
How to save and load models in PyTorch
This article is a machine learning tutorial on how to save and load your models in PyTorch using Weights & Biases for version control.
Image Classification Using PyTorch Lightning and Weights & Biases
This article provides a practical introduction on how to use PyTorch Lightning to improve the readability and reproducibility of your PyTorch code.
LSTM RNN in Keras: Examples of One-to-Many, Many-to-One & Many-to-Many 
In this report, I explain long short-term memory (LSTM) recurrent neural networks (RNN) and how to build them with Keras. Covering One-to-Many, Many-to-One & Many-to-Many.
Using LSTM in PyTorch: A Tutorial With Examples
This article provides a tutorial on how to use Long Short-Term Memory (LSTM) in PyTorch, complete with code examples and interactive visualizations using W&B. 
﻿
﻿
Add a comment
Tags: Keras, Articles, Domain Agnostic, RNN
Iterate on AI agents and models faster. Try Weights & Biases today.