A Guide to Multi-Label Classification on Keras
In this article, we explore the necessary ingredients for multi-label classification, including multi-label binarization, output activation, and loss functions.
Created on August 18|Last edited on July 16
Comment
Most of us are familiar with multi-class classification problems. Some of these problems include object classification (distinguishing cats from dogs from birds), sentiment analysis (is this Tweet positive or negative), and so on. In all these problems, we are looking for our model to classify the input into one of the many possible classes.
In this short article, we'll look into two simple yet crucial ingredients for multi-label classification in Keras.
The output of the neural network is a probability distribution modeling the approximate true distribution. In a multi-class classification, our true label usually corresponds to a single integer. However, in multi-label classification, inputs can be associated with multiple classes. For example, a movie poster can have multiple genres.
Let's take a quick look into a few of the key ingredients of multi-label classification. Here's what we'll be covering:
Table of Contents
Let's dive in!
Multi-Label Binarizer
We usually one hot encode our labels for multi-class classification problems. In one hot encoding, we represent the categorical variables as binary vectors. We first map categorical values to integer values. Then, each integer value is represented as a binary vector where all values are zero except the index of the integer, which is marked with a 1.
However, we know that for multi-label classification problems, we can have any number of classes associated with it. We'll assume that the labels are mutually exclusive, and thus, instead of one hot encoding, we'll try multi-label binarization. Here the label (which can have multiple classes) is transformed into a binary vector such that all values are zero except the indexes associated with each class in that label, which is marked with a 1.
We can easily implement this as shown below:
from sklearn.preprocessing import MultiLabelBinarizer# Create MultiLabelBinarizer objectmlb = MultiLabelBinarizer()# One-hot encode datamlb.fit_transform(y)
Output Activation and Loss Function
Let's first review a simple model capable of doing multi-label classification implemented in Keras.
model = Sequential()model.add(Dense(128, activation='relu', input_shape=X_train.shape[1]))model.add(Dropout(0.1))model.add(Dense(64, activation='relu'))model.add(Dropout(0.1))model.add(Dense(y_train.shape[1], activation='sigmoid')) # <-- Notice activation in final layer.model.compile(loss='binary_crossentropy', # <-- Notice loss function.optimizer='adam)
- For a multi-class classification problem, we use Softmax activation function. This is because we want to maximize the probability of a single class, and softmax ensures that the sum of the probabilities is one. However, we use Sigmoid activation function for the output layer in the multi-label classification setting. What sigmoid does is that it allows you to have a high probability for all your classes or some of them, or none of them.
- For a multi-class classification problem, we often use categorical_crossentropy loss. This is useful since we are interested to approximate the true data distribution (where only one class is true). However, in a multi-label classification setting, we formulate the objective function like a binary classifier where each neuron(y_train.shape[1]) in the output layer is responsible for one vs all class classification. binary_crossentropy is suited for binary classification and thus used for multi-label classification.
Resources
In this article, I have only touched on the key ingredients, so this is suited for someone with prior experience in this topic. However, if you are new to this, here are some useful reads:
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.