Use cases of LSTM for different deep learning tasks. Made by Ayush Thakur using Weights & Biases

In this report, I explain long short-term memory (LSTM) and how to build them with Keras. There are principally the four modes to run a recurrent neural network (RNN).

(Source)

LSTMs can be used for a multitude of deep learning tasks using different modes. We will go through each of these modes along with its use case and code snippet in Keras.

One-to-many sequence problems are sequence problems where the input data has one time-step, and the output contains a vector of multiple values or multiple time-steps. Thus, we have a single input and a sequence of outputs. A typical example is image captioning, where the description of an image is generated. Check out this amazing "Generate Meaningful Captions for Images with Attention Models" report by Rajesh Shreedhar Bhat and Souradip Chakraborty to learn more.

We have created a toy dataset shown in the image below. The input data is a sequence of numbers, while the output data is the sequence of the next two numbers after the input number.

Let us train it with a vanilla LSTM. You can see the loss metric for the train and validation data, as shown in the plots.

`model = Sequential()model.add(LSTM(50, activation='relu', input_shape=(1, 1)))model.add(Dense(2))model.compile(optimizer='adam', loss='mse')wandb.init(entity='ayush-thakur', project='dl-question-bank')model.fit(X, Y, epochs=1000, validation_split=0.2, batch_size=3, callbacks=[WandbCallback()])`

When predicting it with test data, where the input is 10, we expect the model to generate a sequence [11, 12]. The model predicted the sequence [[11.00657 12.138181]], which is close to the expected values.

In many-to-one sequence problems, we have a sequence of data as input, and we have to predict a single output. Sentiment analysis or text classification is one such use case.

We have created a toy dataset, as shown in the image. The input has 15 samples with three time steps, and the output is the sum of the values in each step.

Let us train it with a vanilla LSTM. You can see the loss metric for the train and validation data, as shown in the plots.

`tf.keras.backend.clear_session()model = Sequential()model.add(LSTM(50, activation='relu', input_shape=(3, 1)))model.add(Dense(1))model.compile(optimizer='adam', loss='mse')wandb.init(entity='ayush-thakur', project='dl-question-bank')history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, callbacks=[WandbCallback()])`

When predicting it with test data, the input is a sequence of three time steps: [50,51,52]. The expected output should be the sum of the values, which is 153. The model predicted the value: [[152.9253]], which is insanely close to the expected value.

Many-to-Many sequence learning can be used for machine translation where the input sequence is in some language, and the output sequence is in some other language. It can be used for Video Classification as well, where the input sequence is the feature representation of each frame of the video at different time steps.

Encoder-Decoder network is commonly used for many-to-many sequence tasks. Here encoder-decoder is just a fancy name for a neural architecture with two LSTM layers.

In this toy experiment, we have created a dataset shown in the image below. The input has 20 samples with three time step each, while the output has the next three consecutive multiples of 5.

Let us train it with a vanilla Encoder-Decoder architecture. You can see the loss metric for the train and validation data, as shown in the plots.

`model = Sequential()# encoder layermodel.add(LSTM(100, activation='relu', input_shape=(3, 1)))# repeat vectormodel.add(RepeatVector(3))# decoder layermodel.add(LSTM(100, activation='relu', return_sequences=True))model.add(TimeDistributed(Dense(1)))model.compile(optimizer='adam', loss='mse')print(model.summary())wandb.init(entity='ayush-thakur', project='dl-question-bank')history = model.fit(X, Y, epochs=1000, validation_split=0.2, verbose=1, batch_size=3, callbacks=[WandbCallback()])`

When predicting it with test data, the input is a sequence of three time steps: [300, 305, 310]. The expected output should be a sequence of next three consecutive multiples of five, [315, 320, 325]. The model predicted the value: [[[315.29865], [321.0397 ], [327.0003 ]]] which is close to the expected value.

These are some of the resources that I found relevant for my own understanding of these concepts.

(Solving Sequence Problems with LSTM in Keras blog post by Usman Malik was used to come up with code snippets.)