Kaggle License Plate Detection

A simple algorithm and a simpler dataset for object detection in Kaggle. Made by Aritra Roy Gosthipaty using Weights & Biases
Aritra Roy Gosthipaty

Introduction

In this report, we will talk about a simple algorithm for object detection and a much simpler dataset to dive in. While searching for datasets in Kaggle, I came across this one on vehicle license plates. The dataset was a JSON file that had the links of images of different vehicles and the normalised bounding box co-ordinates of the license plates. For a beginner extracting the data from a CSV or a JSON file can be tricky and pretty exhausting. In my opinion, people learning an algorithm in deep learning should be provided with a simple enough dataset to try their hands on. Sometimes extracting, cleaning and processing the data takes up a lot of time and diminishes the interest in trying out a brand new algorithm.

This led me to process the dataset, make an enhanced dataset, and upload it to Kaggle. Here I will go through all the processes that are important for the data pipeline and will showcase a small object detection algorithm which can serve as an initial approach to the dataset.

Data

The dataset on vehicle license plates has a JSON file named Indian_Number_plates.json. Reading the json file with pandas as a dataframe revealed the type of data it has. There are three columns namely content, annotation and extras. Our main concern are the two columns content and annotation. The content field consists of the URLs for the vehicle images while the annotation field consists of a list of metadata and the co-ordinates for the license plates.

A single annotation looks something like this

"annotation": [
               {"label": ["number_plate"],
                "notes": "",
                "points": [
                           {"x": 0.7220843672456576,
                            "y": 0.5879828326180258},
                           {"x":0.8684863523573201,
                            "y":0.6888412017167382}
                           ],
                "imageWidth":806,
                "imageHeight":466
                }
               ]

We are mainly concerned with the points field. The list of points is the bounding boxes that we need to extract.

Processing

Kaggle

We will process the JSON file in the following two steps:

  1. Download the images: for index, row in df.iterrows():
path = tf.keras.utils.get_file(
    '/kaggle/working/Cars/car{}.jpg'.format(counter),
    row["content"]
    )

Using the tf.keras.utils.get_file I have managed to download all the images that are present in the dataframe. The images are downloaded and are then stored in the kaggle/working/Cars directory.

  1. Create another dataframe that has the co-ordinates only:
for index, row in df.iterrows():
    dataset["image_name"].append(path)data_points = row["annotation"]
    dataset["top_x"].append(data_points[0]["points"][0]["x"])
    dataset["top_y"].append(data_points[0]["points"][0]["y"])
    dataset["bottom_x"].append(data_points[0]["points"][1]["x"])
    dataset["bottom_y"].append(data_points[0]["points"][1]["y"])

This way a data frame is made which has the following fields:

After the images being downloaded and the data frame being formed, I have logged them and created a kernel dataset.

The link to the enhanced dataset: https://www.kaggle.com/aritrag/license

This data set has the following changes:

  1. The images are all downloaded, hence the time to download all of them is cut short.
  2. The data set has a csv file with the co-ordinates of the license plates only. We no longer need to parse the annotation and then extract the co-ordinates from it.
  3. A starter notebook that has the implementation of tf.data.Dataset.

The simple algorithm

Kaggle I was following the Michigan’s course on Computer Vision and was studying object detection. The idea is a simple as it gets. The idea is divided into two parts. The first part deals with the feature extractor and the next part is about predicting the bounding box coordinates.

The feature extractor can be a Convolutional Neural Network, while the second part is a linear regressor. We just need to regress four points for every image that comes in the input pipeline. We then back-propagate the error between the predicted points and the ground truth points.

Simple isn’t it? Let’s get this thing coded

i = tf.keras.layers.Input(shape=(224, 224, 3))
x = tf.keras.layers.Conv2D(64, (5,5), activation='relu')(i)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(128, (5,5), activation='relu')(x)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(256, (7,7), activation='relu')(x)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(512, (7,7), activation='relu')(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(256,activation='relu')(x)
x = tf.keras.layers.Dense(128,activation='relu')(x)
x = tf.keras.layers.Dense(64,activation='relu')(x)
o = tf.keras.layers.Dense(4,activation='sigmoid')(x)

model = tf.keras.Model(inputs=[i], outputs=[o])
model.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])

Now some pointers: The last layer is sigmoid activated. This helps in not only normalizing our co-ordinates but also provide the non-linearity with our Multi-Layer Perceptron needs while it learns. The loss is the binary crossentropy between the predicted and the true points. We also use a callback that helps us not to overfit on the data.

Section 2

Visualization

After training the model it is time we evaluated. model.evaluate() does tell us how our model fared but would visualization not be a better option to see what and how are models did? To visualize the image with the bounding box I had to write myself a piece of code as shown below.

def show_img_bbox(img, label):
    img = img.numpy()
    y_hat = label.numpy()*224
    xt, yt = int(y_hat[0]), int(y_hat[1])
    xb, yb = int(y_hat[2]), int(y_hat[3])
    image = cv2.rectangle(img, (xt, yt), (xb, yb), (0, 0, 255), 3)
    plt.imshow(image)
    plt.show()

With wand.Image bounding boxes were never easier. You need to pass the image and a dictionary of co-ordinates. The following code snippet helps in building the image with bounding boxes of the predicted coordinates and the true labels.

def wandb_img(temp_img, temp_pred, temp_label):
  img = wandb.Image(
      temp_img,
      boxes={
          "predictions": {
              "box_data": [{
                  "position": {
                      "minX": float(temp_pred[0]),
                      "maxX": float(temp_pred[2]),
                      "minY": float(temp_pred[1]),
                      "maxY": float(temp_pred[3]),
                  },
                  "class_id": 1,
              },]
          },
          "ground_truth": {
              "box_data": [{
                  "position": {
                      "minX": float(temp_label[0]),
                      "maxX": float(temp_label[2]),
                      "minY": float(temp_label[1]),
                      "maxY": float(temp_label[3]),
                  },
                  "class_id": 1,
              },]
          }
      }
  )
  return img

One can press the :gear: icon in the visualization and play with the boxes.

Section 5

Further scope

While I processed the data-set I also created a kernel dataset that is way easier to use. Beginners in computer vision do not need to think about the input pipeline so much, they can dive straight into the algorithms that they want to use and come up with something of their own. I had some great prospects with the dataset:

  1. Think about a better objective function for the problem statement.
  2. Use a better architecture for the feature extractor.
  3. Use something other than the linear regressor for the bounding boxes.
  4. Using an Optical Character Recognition model on top of the license plate.
  5. The whole model can be ported to a mobile device and used by the traffic police.

Sayak Paul and I had built a project around license plate detection on a mobile device. It was a fun project. One can find the link to the repository here. The application is made with Flutter. The model is a tflite model. I would like to thank Sayak da for his help in the project.

You can reach out to me at Twitter: @ariG23498 Comment below as to what you are going to do with the dataset 😃