Image Classification with Keras
Build an image classification pipeline
In this report, we'll build a pipeline to train an image classifier in Keras and gain some intuition around the hyperparameters that we can tune to optimize the performance of our classifier.
Run example in colab →
1. Data Pipeline
We'll use the CIFAR-10 dataset for this example. First let's download the dataset.
## import
from tensorflow.keras.datasets import cifar10
## download
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
We will use TensorFlow's tf.data
to create our input pipeline.
trainloader = tf.data.Dataset.from_tensor_slices((x_train, y_train))
trainloader = (
trainloader
.shuffle(1024)
.map(preprocess_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(tf.data.experimental.AUTOTUNE)
)
# ... load test data ...
CLASS_NAMES = ["airplane", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck"]
Each image in the dataset is a 32 by 32 pixel RBG image, and there are 10 classes .
2. Model Pipeline
We'll use Keras' high level API to build a simple classification model.
-
We start with an input layer(
keras.layers.Input
) which takes in the images in our dataset and specify the input shape. -
Since we are doing image classification, we add two convolutional layers('keras.layers.Conv2D`). These layers extract relevant features from the input space, while maintaining the spatial information contained in an image. You can learn more about CNNw here.
- Next up, we add either a MaxPooling(
keras.layers.MaxPooling2D
) or an AveragePooling(keras.layers.AveragePooling2D
layer to reduce the dimensionality of the feature maps we learnt.
- After a couple Conv and MaxPooling layers, we flatten the image using the
Flatten
layer. I used aGlobalAveragePooling2D
(GAP) layer instead as flattening creates a lot of parameters, especially for such a shallow network. The massive number of parameters can lead to overfitting in the fully connected networks. GAP really helps counter over-parameterization, and thus overfitting. Learn more aboutGlobalAveragePooling
here.
- Now that our
conv-block
is in place we can add a few fully connected layers. In the Keras ecosystem, we define fully connected layers usingkeras.layers.Dense
.
- Finally we have our output layer. This size and activation function used in this layer depend on the task at hand. For image classification we use a
Dense
layer with the number of output neurons equal to the number of classes,NUM_CLASSES
.
- We then build our model using
keras.models.Model
. In keras we define model either asSequential
, or by using the functional API which allows us to build more complicated architectures. Our image classifier is built using functional API. More on sequential vs functional API here.
def Model():
inputs = keras.layers.Input(shape=(IMG_SHAPE, IMG_SHAPE, CHANNELS))
x = keras.layers.Conv2D(filters=32,
kernel_size=(3,3),
strides=(1,1),
padding='valid',
activation='relu')(inputs)
x = keras.layers.Conv2D(filters=32,
kernel_size=(3,3),
strides=(1,1),
padding='valid',
activation='relu')(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Dense(128, activation='relu')(x)
x = keras.layers.Dense(32, activation='relu')(x)
outputs = keras.layers.Dense(NUM_CLASSES, activation='softmax')(x)
return keras.models.Model(inputs=inputs, outputs=outputs)
3. Loss and Optimizer
In Keras, defining the loss and optimizer requires just one line of code.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
- Here we are using
Adam
optimizer. In general Adam and its variant Nadam converge faster, whereas a well tuned Stochastic Gradient Descent(SGD) optimizer can outperform an Adam optimizer while taking longer to converge. Learn more about optimizers in this amazing blog post By Sebastian Ruder.
- Our loss function is
sparse_categorical_crossentropy
because our labels were not one hot encoded. If they were, we'd usecategorical_crossentropy
instead. A good introduction to cross entropy losses can be found here.
4. Baseline Classifier
Next up, let's start training our model and track our model performance with Weights & Biases.
We will train our classifier with two callbacks:
-
Early Stopping: This callback ensures that we avoid overfitting by stopping training when the training and validation loss diverge. More on early sopping here.
earlystoper = tf.keras.callbacks.EarlyStopping( monitor='val_loss', patience=10, verbose=0, mode='auto', restore_best_weights=True ## This will ensure the best instance of the model over the validation set. )
-
WandbCallback: This Keras callback automatically save all the model performance metrics, predictions and hyperparameters tracked in
model.fit
. Check out the official docs here. You can see the model metrics and predictions automatically logged by Wights & Biases by adding the callback below –
Experimenting with model hyperparameters
Next, we'll run a simple experiment to determine the effect of increasing model capacity on performance. Model capacity is characterized as the total number of parameters that the neural network needs to learn.
We will simply add two more convolutional neural network layers to our baseline model. By doing so we increased the depth of our classifier and added two more feature extractors which can learn more of the higher order features of the images. This will increase the number of trainable parameters from 18,826 to 37,322.
x = keras.layers.Conv2D(filters=32,
kernel_size=(3,3),
strides=(1,1),
padding='valid',
activation='relu')(x)
x = keras.layers.Conv2D(filters=32,
kernel_size=(3,3),
strides=(1,1),
padding='valid',
activation='relu')(x)
Let's see the effect of increased depth, while keeping the same set of hyperparameters and regularizers.
Observations
-
With the addition of two new CNN layers, the model performance (training accuracy) increased by a margin of ~10%.
-
The model quickly overfits with the addition of more trainable parameters (see the divergence in training and validation accuracy/loss is in the purple model in the graphs above). Early stopping terminated the learning process for the new model quicker than it did for the baseline model.
- Even with the overfitting, the validation test loss for the model with increased capacity (purple dashed line) is lower than that of the the baseline (red dashed line).
As you can see, by logging your model performance in Weights & Biases, you can quickly try a lot of different flavors of your model and find the one with the best performance. You could also debug your model by looking at the predictions themselves to see if the model did a better job of classifying some classes more than the others.
From here you can try out different experiments and see if you can improve the model performance. Some ideas for things to try:
- Change the parameters for
Conv2D
layers orDense
layers. - Increase the depth of the model until the model overfits after 2-3 epochs.
- Replace
GlobalAveragePooling2D
layer toFlatten
and see the effect. It would be a nice study. - Try your model with a different dataset and fine-tune it to do well on the new dataset.
When you do, we'd love to see you create your own reports documenting the results from your experiments.
Weights & Biases
Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.