Kaggle License Plate Detection With Weights & Biases
In this article, we look at a simple algorithm and a simpler dataset for object detection in Kaggle and use Weights & Biases to visualize images.
Created on November 2|Last edited on November 17
Comment
In this article, we will talk about a simple algorithm for object detection and a much simpler dataset to dive in. While searching for datasets in Kaggle, I came across this one on vehicle license plates.
The dataset was a JSON file that had the links of images of different vehicles and the normalised bounding box coordinates of the license plates. For a beginner extracting the data from a CSV or a JSON file can be tricky and pretty exhausting. In my opinion, people learning an algorithm in deep learning should be provided with a simple enough dataset to try their hands on. Sometimes extracting, cleaning and processing the data takes up a lot of time and diminishes the interest in trying out a brand-new algorithm.
This led me to process the dataset, make an enhanced dataset, and upload it to Kaggle. Here I will go through all the processes that are important for the data pipeline and will showcase a small object detection algorithm which can serve as an initial approach to the dataset.
Table of Contents
Data
The dataset on vehicle license plates has a JSON file named Indian_Number_plates.json. Reading the json file with pandas as a dataframe revealed the type of data it has. There are three columns namely content, annotation and extras. Our main concerns are the two columns content and annotation. The content field consists of the URLs for the vehicle images while the annotation field consists of a list of metadata and the coordinates for the license plates.
A single annotation looks something like this
"annotation": [{"label": ["number_plate"],"notes": "","points": [{"x": 0.7220843672456576,"y": 0.5879828326180258},{"x":0.8684863523573201,"y":0.6888412017167382}],"imageWidth":806,"imageHeight":466}]
We are mainly concerned with the points field. The list of points is the bounding boxes that we need to extract.
Processing
We will process the JSON file in the following two steps:
- Download the images: for index, row in df.iterrows():
path = tf.keras.utils.get_file('/kaggle/working/Cars/car{}.jpg'.format(counter),row["content"])
Using the tf.keras.utils.get_file I have managed to download all the images that are present in the dataframe. The images are downloaded and are then stored in the kaggle/working/Cars directory.
- Create another dataframe that has the co-ordinates only:
for index, row in df.iterrows():dataset["image_name"].append(path)data_points = row["annotation"]dataset["top_x"].append(data_points[0]["points"][0]["x"])dataset["top_y"].append(data_points[0]["points"][0]["y"])dataset["bottom_x"].append(data_points[0]["points"][1]["x"])dataset["bottom_y"].append(data_points[0]["points"][1]["y"])
This way a data frame is made which has the following fields:
- image_name: The path where the image is downloaded; str
- top_x: The % value of the topx bbox point; float
- top_y: The % value of the topy bbox point; float
- bottom_x: The % value of the bottomx bbox point; float
- bottom_y: The % value of the bottomy bbox point; float
After the images being downloaded and the data frame being formed, I have logged them and created a kernel dataset.
This data set has the following changes:
- The images are all downloaded, hence the time to download all of them is cut short.
- The data set has a csv file with the coordinates of the license plates only. We no longer need to parse the annotation and then extract the coordinates from it.
- A starter notebook that has the implementation of tf.data.Dataset.
The Simple Algorithm
I was following Michigan’s course on Computer Vision and was studying object detection. The idea is as simple as it gets. The idea is divided into two parts. The first part deals with the feature extractor and the next part is about predicting the bounding box coordinates.
The feature extractor can be a Convolutional Neural Network, while the second part is a linear regressor. We just need to regress four points for every image that comes in the input pipeline. We then back-propagate the error between the predicted points and the ground truth points.
Simple isn’t it? Let’s get this thing coded
i = tf.keras.layers.Input(shape=(224, 224, 3))x = tf.keras.layers.Conv2D(64, (5,5), activation='relu')(i)x = tf.keras.layers.MaxPool2D()(x)x = tf.keras.layers.Conv2D(128, (5,5), activation='relu')(x)x = tf.keras.layers.MaxPool2D()(x)x = tf.keras.layers.Conv2D(256, (7,7), activation='relu')(x)x = tf.keras.layers.MaxPool2D()(x)x = tf.keras.layers.Conv2D(512, (7,7), activation='relu')(x)x = tf.keras.layers.GlobalAveragePooling2D()(x)x = tf.keras.layers.Dense(256,activation='relu')(x)x = tf.keras.layers.Dense(128,activation='relu')(x)x = tf.keras.layers.Dense(64,activation='relu')(x)o = tf.keras.layers.Dense(4,activation='sigmoid')(x)model = tf.keras.Model(inputs=[i], outputs=[o])model.compile(optimizer='adam', loss="binary_crossentropy", metrics=['accuracy'])
Now some pointers: The last layer is sigmoid activated. This helps in not only normalizing our coordinates but also provides the non-linearity with our Multi-Layer Perceptron needs while it learns. The loss is the binary crossentropy between the predicted and the true points. We also use a callback that helps us not to overfit on the data.
Run set
1
Visualization
After training the model it is time we evaluated. model.evaluate() does tell us how our model fared but would visualization not be a better option to see what and how are models did? To visualize the image with the bounding box I had to write myself a piece of code as shown below.
def show_img_bbox(img, label):img = img.numpy()y_hat = label.numpy()*224xt, yt = int(y_hat[0]), int(y_hat[1])xb, yb = int(y_hat[2]), int(y_hat[3])image = cv2.rectangle(img, (xt, yt), (xb, yb), (0, 0, 255), 3)plt.imshow(image)plt.show()
With wand.Image bounding boxes were never easier. You need to pass the image and a dictionary of coordinat. The following code snippet helps in building the image with bounding boxes of the predicted coordinates and the true labels.
def wandb_img(temp_img, temp_pred, temp_label):img = wandb.Image(temp_img,boxes={"predictions": {"box_data": [{"position": {"minX": float(temp_pred[0]),"maxX": float(temp_pred[2]),"minY": float(temp_pred[1]),"maxY": float(temp_pred[3]),},"class_id": 1,},]},"ground_truth": {"box_data": [{"position": {"minX": float(temp_label[0]),"maxX": float(temp_label[2]),"minY": float(temp_label[1]),"maxY": float(temp_label[3]),},"class_id": 1,},]}})return img
One can press the ⚙️ icon in the visualization and play with the boxes.
Run set
1
Further Scope
While I processed the data set I also created a kernel dataset that is way easier to use. Beginners in computer vision do not need to think about the input pipeline so much, they can dive straight into the algorithms that they want to use and come up with something of their own.
I had some great prospects with the dataset:
- Think about a better objective function for the problem statement.
- Use a better architecture for the feature extractor.
- Use something other than the linear regressor for the bounding boxes.
- Using an Optical Character Recognition model on top of the license plate.
- The whole model can be ported to a mobile device and used by the traffic police.
Sayak Paul and I had built a project around license plate detection on a mobile device. It was a fun project. One can find the link to the repository here. The application is made with Flutter. The model is a tflite model. I would like to thank Sayak da for his help in the project.
You can reach out to me at Twitter: @ariG23498
Comment below as to what you are going to do with the dataset 😃
Add a comment
Amazing report Aritra. I loved the idea to take a relatively hard dataset and upload a cleaner version of the same. Definitely beginners will find it easy to use.
Besides uploading it to Kaggle you can also make it public by using W&B artifacts. That way the dataset is readily available to every platform.
Also at it's core your licence plate detection model is an object localization model. Correct me if I am wrong.
Here's my take on object localization: https://wandb.ai/wandb/object_localization/reports/Object-Localization-with-Keras-and-W-B--VmlldzoyNzA2Mzk
Thank you for sharing your work.
1 reply
Tags: Intermediate, Computer Vision, Object Detection, Keras, Experiment, CNN, Github, Panels, Plots, Kaggle
Iterate on AI agents and models faster. Try Weights & Biases today.