Skip to main content

EMNIST Classification

Image classification on EMNIST/bymerge dataset.
Created on August 29|Last edited on October 9

Introduction

Image classification is a standard computer vision task where given the input image we predict its true class. In deep learning image classification is one of the most widely studied tasks.

In this report, we will build a simple image classifier using Fully Connected Neural Network and then using Convolutional Neural Network. We will evaluate our classifier using the confusion matrix.

This report is a quick introduction to using Weights and Biases and Reports. Reports let you organize visualizations, describe your findings, and share updates with collaborators.

You can use Reports for:

  • Notes: Add a graph with a quick note to yourself.
  • Collaboration: Share findings with your colleagues.
  • Work log: Track what you've tried, and plan next steps.

EMNIST Dataset - A Quick Investigation

Before we build our classifier let's do a quick investigation of the EMNIST dataset and make some decisions.

Check out the Colab notebook covering EMNIST investigation here →\rightarrow

Points to note:

  • We are using bymerge variant of EMNIST dataset. Here labels like j, o, i etc which look like J, O, I are merged.

  • The EMNIST images provided here are inverted horizontally and rotated 90 anti-clockwise. For the ease of experimentation, we don't want to use it in this configuration. Thus we will rotate the image back by 90 deg anti-clockwise.

  • We have a total of 814255 images. They are 28x28 pixels in resolution with only one channel.

  • We have 697932 images as training data.

  • We have 116323 images as testing data.

  • We have 47 classes as shown:

LABELS = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 
          'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
          'a', 'b', 'd', 'e', 'f', 'g', 'h', 'n', 'q', 'r', 't']

Make sure to check out the paper →\rightarrow




Run set
1


Image Classifier - Fully Connected Neural Network

An image is a 2D grid of pixels. The pixels exhibit spatial relationships. When designing an image classifier there are few go-to architectural choices. Convolutional Neural Network based image classifiers are popular because they work well. However, we will start with a simple image classifier using a fully connected neural network.

We use the permutation-invariant setting, where each 28 × 28 EMNIST images is treated as a 784D vector without spatial structure, thus requires to use an MLP instead of a CNN.

Check out the Colab notebook covering model training with Dense Network here →\rightarrow

Let's look at the model architecture.

def DenseModel():
  inputs = Input(shape=(784,))
  x = Dense(256, activation='relu')(inputs)
  x = Dense(128, activation='relu')(x)
  outputs = Dense(len(LABELS), activation='softmax')(x)

  return Model(inputs=inputs, outputs=outputs)

The code below shows how to use Weights and Biases to automatically log model metrics. Learn more about W&B Keras integration here →\rightarrow

# initialize wandb run
wandb.init(project='my-emnist-classifier')

# hyperparameters 
config = wandb.config
config.epochs = 70
config.learning_rate = 0.001

# model
tf.keras.backend.clear_session()
model = DenseModel()

# optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=config.learning_rate)

# compile
model.compile(optimizer, 'categorical_crossentropy', metrics=['acc'])

# train
model.fit(trainloader,
          epochs=config.epochs,
          validation_data=testloader,
          callbacks=[WandbCallback(),
                     early_stopper])

Let's train our Dense model and look at the results.

  • Our model achieved ~87% validation accuracy. While it achieved ~90% training accuracy.
  • The model training was terminated after 10 epochs as it started overfitting. We were training our model with early stopping regularizer with an upper bound of 70 epochs and patience of 5 epochs.



Run set
1


Image Classifier - Convolutional Neural Network

Now let's build a simple image classifier using CNN as a primary feature extractor. There are primarily three-fold benefits of using CNN:

  • Sparse Interactions
  • Parameter Sharing
  • Equivariant Representation

You can learn more about these benefits in Section 9.2 of Deep Learning book

We will focus on building our image classifier using CNN. Let's look at the model architecture.

def CNNModel():
  inputs = Input(shape=(28,28,1))
  x = Conv2D(32, (3,3), activation='relu')(inputs)
  x = MaxPooling2D(pool_size=2)(x)
  x = Conv2D(64, (3,3), activation='relu')(x)
  x = MaxPooling2D(pool_size=2)(x)
  x = Conv2D(64, (3,3), activation='relu')(x)
  x = MaxPooling2D(pool_size=2)(x)
  x = Flatten()(x)
  outputs = Dense(len(LABELS), activation='softmax')(x)

  return Model(inputs=inputs, outputs=outputs)

There's nothing fancy here. However, it's best practice to start with a simple model and have a model training pipeline ready. With a baseline in place, we can increase model complexity, play with hyperparameters, etc. Weights and Biases will log everything in one place so that you can focus more on building your models.

The code to train our model will not change much from our training our fully connected network.

Check out the Colab notebook covering model training with CNN here →\rightarrow

Let's train our model and infer our results.

  • We are comparing the metrics from a fully connected and convolutional network.
  • We can clearly see that with CNN the model didn't overfit quickly. The model was terminated after 19 epochs of training compared to 10 epochs in case of a fully connected network.
  • The difference between training and validation metrics for convolutional neural network based model is smaller compared to fully connected model. This is because the convolutional network has sparser interaction(fewer trainable parameters).

Using WandbCallback we can automatically log predictions on small validation subset. The result is shown in the Prediction on Images chart.




Run set
2


Confusion Matrix

Finally, we will evaluate our model using the confusion matrix.

  • We can clearly see that the model is performing similarly to our fully connected network.
  • The model seems to confuse with similar pair of classes.



Run set
1


Further Reading

This report introduces Weights and Biases and Reports. Tracking experiments is one thing but efficiently documenting it takes practice.

Here are a few resources that might be helpful:

I have provided two extra notebooks, training fully connected and convolutional networks with class weights. 😉