Assignment 2 : Convolutional Neural Networks
Instructions
- The goal of this assignment is threefold: (i) train a CNN model from scratch and learn how to tune the hyperparameters and visualise filters (ii) finetune a pre-trained model just as you would do in many real world applications (iii) use an existing pre-trained model for a cool application.
- We strongly recommend that you work on this assignment in a team of size 2. Both the members of the team are expected to work together (in a subsequent viva both members will be expected to answer questions, explain the code, etc).
- Collaborations and discussions with other groups are strictly prohibited.
- You must use Python (numpy and pandas) for your implementation.
- You can use any and all packages from keras, pytorch, tensorflow
- You can run the code in a jupyter notebook on colab by enabling GPUs.
- You have to generate the report in the same format as shown below using wandb.ai. You can start by cloning this report using the clone option above. Most of the plots that we have asked for below can be (automatically) generated using the apis provided by wandb.ai
- You also need to provide a link to your github code as shown below. Follow good software engineering practices and set up a github repo for the project on Day 1. Please do not write all code on your local machine and push everything to github on the last day. The commits in github should reflect how the code has evolved during the course of the assignment.
- You have to check moodle regularly for updates regarding the assignment.
Problem Statement
In Part A and Part B of this assignment you will build and experiment with CNN based image classifiers using a subset of the iNaturalist dataset. In Part C you will take a pre-trained object detection model and use it for a novel application.
Part A: Training from scratch
Question 1 (5 Marks)
Build a small CNN model consisting of 5 convolution layers. Each convolution layer would be followed by a ReLU activation and a max pooling layer. Here is sample code for building one such conv-relu-maxpool block in keras.
model = Sequential()
model.add(Conv2D(16, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
After 5 such conv-relu-maxpool blocks of layers you should have one dense layer followed by the output layer containing 10 neurons (1 for each of the 10 classes). The input layer should be compatible with the images in the iNaturalist dataset.
The code should be flexible such that the number of filters, size of filters and activation function in each layer can be changed. You should also be able to change the number of neurons in the dense layer.
(a) What is the total number of computations done by your network? (assume mm filters in each layer of size k×kk\times k and nn neurons in the dense layer)
(b) What is the total number of parameters in your network? (assume mm filters in each layer of size k×kk\times k and nn neurons in the dense layer)
Answer :-
The code is flexible, the number of filters, size of filters and activation function in each layer can be changed.
These can be passed as command-line arguments to main.py.
--augmentation # Add if you need data augmentation.
--train_path #Path of the train data directory
--test_path' #Path of the test data directory
--batch_size #Batch size
--learning_rate #Learning rate
--image_size # Image size -> height, width
--num_conv_layers #Number of Convolution-Pool Blocks
--num_epochs #Number of Epochs to train for.
--num_filters #'Number of Filters in Convolution Layer, space separated.
--filter_size #Filter size in each convolution layer, comma seperated
--pool_size' #Pool size in each MaxPool layer
--dense_neurons #Neurons in Dense Layer after all Convolution Layers.
--batch_norm #Add if you need batch norm layer.
--dropout #Dropout Rate, 0 indicates no dropout.
Example :-
python main.py --augmentation --image_size 224 224 --num_conv_layers 5 \
--num_filters 32 64 128 256 512 --filter_size 11,11 7,7 5,5 3,3 2,2 \
--pool_size 2,2 2,2 2,2 2,2 2,2 --dense_neurons 256 --dropout 0.2 \
--batch_norm --learning_rate 0.00001 --batch_size 64
Number of computations done by a network = number of multiplications and additions.
Layer | Input | Output | Number of Parameters | Number of operations |
---|---|---|---|---|
Block1_Conv2D | (H,W,3)(H,W,3) | ((H−(k−1)),(W−(k−1)),m)((H-(k-1)),(W-(k-1)),m) | k∗k∗3∗mk * k * 3 * m | ((H−(k−1))∗(W−(k−1))∗m)∗3∗k2+((H−(k−1))∗(W−(k−1))∗m)((H-(k-1)) * (W-(k-1)) * m) * 3 * k^2 + ((H-(k-1)) * (W-(k-1)) * m) |
Block1_MaxPool | ((H−(k−1)),(W−(k−1)),m)((H-(k-1)),(W-(k-1)),m) | ((H−2(k−1)),(W−2(k−1)),m)((H-2(k-1)),(W-2(k-1)),m) | 0 | ((H−(k−1))∗(W−(k−1))∗m)((H-(k-1)) * (W-(k-1)) * m) |
Block2_Conv2D | ((H−2(k−1)),(W−2(k−1)),m)((H-2(k-1)),(W-2(k-1)),m) | ((H−3(k−1)),(W−3(k−1)),m)((H-3(k-1)),(W-3(k-1)),m) | k∗k∗m2k * k * m^2 | ((H−3(k−1))∗(W−3(k−1))∗m)∗m∗k2+((H−3(k−1))∗(W−3(k−1))∗m)((H-3(k-1)) * (W-3(k-1)) * m) * m * k^2 + ((H-3(k-1)) * (W-3(k-1)) * m) |
Block2_MaxPool | ((H−3(k−1)),(W−3(k−1)),m)((H-3(k-1)),(W-3(k-1)),m) | ((H−4(k−1)),(W−4(k−1)),m)((H-4(k-1)),(W-4(k-1)),m) | 0 | ((H−3(k−1))∗(W−3(k−1))∗m)((H-3(k-1)) * (W-3(k-1)) * m) |
Block3_Conv2D | ((H−4(k−1)),(W−4(k−1)),m)((H-4(k-1)),(W-4(k-1)),m) | ((H−5(k−1)),(W−5(k−1)),m)((H-5(k-1)),(W-5(k-1)),m) | k∗k∗m2k * k * m^2 | ((H−5(k−1))∗(W−5(k−1))∗m)∗m∗k2+((H−5(k−1))∗(W−5(k−1))∗m)((H-5(k-1)) * (W-5(k-1)) * m) * m * k^2 + ((H-5(k-1)) * (W-5(k-1)) * m) |
Block3_MaxPool | ((H−5(k−1)),(W−5(k−1)),m)((H-5(k-1)),(W-5(k-1)),m) | ((H−6(k−1)),(W−6(k−1)),m)((H-6(k-1)),(W-6(k-1)),m) | 0 | ((H−5(k−1))∗(W−5(k−1))∗m)((H-5(k-1)) * (W-5(k-1)) * m) |
Block4_Conv2D | ((H−6(k−1)),(W−6(k−1)),m)((H-6(k-1)),(W-6(k-1)),m) | ((H−7(k−1)),(W−7(k−1)),m)((H-7(k-1)),(W-7(k-1)),m) | k∗k∗m2k * k * m^2 | ((H−7(k−1))∗(W−7(k−1))∗m)∗m∗k2+((H−7(k−1))∗(W−7(k−1))∗m)((H-7(k-1)) * (W-7(k-1)) * m) * m * k^2 + ((H-7(k-1)) * (W-7(k-1)) * m) |
Block4_MaxPool | ((H−7(k−1)),(W−7(k−1)),m)((H-7(k-1)),(W-7(k-1)),m) | ((H−8(k−1)),(W−8(k−1)),m)((H-8(k-1)),(W-8(k-1)),m) | 0 | ((H−7(k−1))∗(W−7(k−1))∗m)((H-7(k-1)) * (W-7(k-1)) * m) |
Block5_Conv2D | ((H−8(k−1)),(W−8(k−1)),m)((H-8(k-1)),(W-8(k-1)),m) | ((H−9(k−1)),(W−9(k−1)),m)((H-9(k-1)),(W-9(k-1)),m) | k∗k∗m2k * k * m^2 | ((H−9(k−1))∗(W−9(k−1))∗m)∗m∗k2+((H−9(k−1))∗(W−9(k−1))∗m)((H-9(k-1)) * (W-9(k-1)) * m) * m * k^2 + ((H-9(k-1)) * (W-9(k-1)) * m) |
Block5_MaxPool | ((H−9(k−1)),(W−9(k−1)),m)((H-9(k-1)),(W-9(k-1)),m) | ((H−10(k−1)),(W−10(k−1)),m)((H-10(k-1)),(W-10(k-1)),m) | 0 | ((H−9(k−1))∗(W−9(k−1))∗m)((H-9(k-1)) * (W-9(k-1)) * m) |
- Assuming input image has (H,W,3)(H,W,3) dimensions.
- Assuming stride of 1 and Padding of 0 at each layer (Conv as well as Maxpooling).
- Assuming max function can take any number of inputs and each max operation is one computation.
- Number of operation at any conv layer = applying conv filter + applying activation function (Last column of the table).
- Applying activation function on each element is assumed to be one computation.
- After the last layer, Flattening layer is applied.
Dense layer
Layer | Input | Output | Number of Parameters | Number of operations |
---|---|---|---|---|
FC_layer1 | (H1,1)(H_1,1) | (H2,1)(H_2,1) | H2∗H1+H2H_2*H_1+H_2 | (H2∗H1∗H1+H2)+(H2)(H_2*H_1*H_1+H_2)+(H_2) |
Output_layer1 | (H1,1)(H_1,1) | (10,1)(10,1) | 10∗H1+1010*H_1+10 | (10∗H1∗H1+10)+(10)(10*H_1*H_1+10)+(10) |
- Number of operations at each layer = Matrix operation+ applying activation function.
- Total computation: Sum of elements in "number of operations" column in both tables.
- Total parameters: Sum of elements in "Number of parameters" column in both tables.
Question 2 (10 Marks)
You will now train your model using the iNaturalist dataset. The zip file contains a train and a test folder. Set aside 10% of the training data for hyperparameter tuning. Make sure each class is equally represented in the validation data. Do not use the test data for hyperparameter tuning.
Using the sweep feature in wandb find the best hyperparameter configuration. Here are some suggestions but you are free to decide which hyperparameters you want to explore
- number of filters in each layer : 32, 64, ...
- filter organisation: same number of filters in all layer, doubling in each subsequent layer, halving in each subsequent layer, etc
- data augmentation (easy to do in keras): Yes, No
- dropout: 20%, 30% (btw, where will you add dropout? you should read up a bit on this)
- batch normalisation: Yes, No
Based on your sweep please paste the following plots which are automatically generated by wandb:
- accuracy v/s created plot (I would like to see the number of experiments you ran to get the best configuration).
- parallel co-ordinates plot
- correlation summary table (to see the correlation of each hyperparameter with the loss/accuracy)
Also write down the hyperparameters and their values that you sweeped over. Smart strategies to reduce the number of runs while still achieving a high accuracy would be appreciated. Write down any unique strategy that you tried.
Answer :-
We have set aside 10% data for validation.
For Data Augmentation :-
We use the following augmentations :-
- rescale = 1./255,
- horizontal_flip = True,
- rotation_range = 30,
- shear_range = 0.2,
- zoom_range = [0.75, 1.75],
- width_shift_range = 0.2,
- height_shift_range = 0.2,
We have kept the image size as 224x224, for running the sweep.
Therefore, we reshape all the images to this shape before training the model.
Our network has an input layer of 224x224x3 (3 channel RGB image of 224x224 size).
However, other image shapes will also work fine.
We kept the Relu Activation for each layer as the default choice and chose Adam as the optimization Algorithm.
These were not set in the sweep as there were multiple parameters to sweep over and thus resulting in large number of combinations.
Before starting with the sweep, I experimented with multiple batch sizes [32,64,128]
Batch size 128 worked faster but the metrics were not that good, so that choice was discarded.
Batch size 32 - the metrics were better, but it worked slower so that choice was discarded.
Therefore, batch size 64 was chosen to balance the tradeoff.
Rest of the hyparameters were variables in the sweep.
Configuration for the sweep :-
-
num_epochs : [20,30]
-
num_filters :
[
- [32,64,128,256,512], #Doubles the number of filters in each subsequent layer.
- [64,64,64,64,64], #Keeps the number of filters in each layer the same.
- [512,256,128,64,32] #Halves the number of filters in each subsequent layer.
] -
filter_size : [
- [(7,7),(7,7),(5,5),(3,3),(2,2)], #Larger filters in shallow layers, and smaller in deeper layers.
- [(2,2),(3,3),(5,5),(7,7),(7,7)], #Smaller filters in shallow layers, and larger in deeper layers.
- [(3,3),(3,3),(3,3),(3,3),(3,3)], #Same filter shape throughout the network.
] -
pool_size : [(2,2),(2,2),(2,2),(2,2),(2,2)] #Pooling size was kept constant.
-
learning_rate: [1e-3,1e-4]
-
batch_norm: [True, False]
-
dense_neurons: [128, 256, 512]
-
augmentation: [True, False]
-
dropout: [0,0.2,0.5]
The Bayesian Sweep strategy was selected as it uses a Gaussian Function so as to try and optimize the probability of improving upon the metric specified.
We perform the Bayesian sweep with goal set to maximize the validation accuracy.
In order to reduce the number of experiments, we used early terminate by setting the hyperband criteria in Wandb.
We set min_iter to 10.
So after 10 epochs, the validation accuracy of the current run is compared with the previously logged metrics of other runs.
If for the current run, the validation accuracy is too low, the current run gets terminated.
Question 3 (15 Marks)
Based on the above plots write down some insightful observations. For example,
- adding more filters in the initial layers is better
- Using bigger filters in initial layers and smaller filters in latter layers is better
- ..
(Note: I don't know if any of the above statements is true. I just wrote some random comments that came to my mind)
Answer :-
1. Number of filters in each layer.
From the parallel coordinates chart, it can be seen that :-
Most of the high validation accuracy configurations have double filter configuration, that is number of filters in deeper layers is double the number of filters in previous layer. [32,64,128,256,512]
There are also a few good validation accuracy models with same number of filters in each layer [64,64,64,64,64].
However, the half configuration that is that is number of filters in deeper layers is half the number of filters in previous layer does not work well, and we can see that the models with this configuration have lower validation accuracy. [512, 256, 128, 64, 32]
2. Size of filters corresponding to each layer.
-
[(3,3),(3,3),(3,3),(3,3),(3,3)], #Same filter shape throughout the network.
-
Keeping same filter kernel size in all layers worked better for us.
-
Most of the high validation accuracy models have this filter size configuration (3x3 kernels in each convolution layer).
3. Number of Neurons in Dense Layer
- As evident from the correlation summary, number of neurons in dense layer has the highest positive correlation with validation accuracy.
- That is more the number of dense layer neurons, greater the validation accuracy.
- Almost all the high validation accuracy models have Dense layer of size 512.
- Dense layer size of 128 didn't work well.
- Dense layer size of 256 didn't work well but it produced better configurations than Dense layer size of 128.
4. Learning Rate
- As evident from the correlation summary, learning rate has strong negative correlation with validation accuracy.
- In almost all the high validation accuracy configurations, the learning rate was 1e-4.
5. Batch Normalization
- Batch Normalization didn't work well when learning rate was low - 1e-4.
- However, we can see improvements in performance when using batch normalization with learning rate of 1e-3.
6. Data Augmentation.
- Without data augmentation, the model over-fits on the training data and we get train accuracy close to 99 in just 10 epochs.
- However, the validation accuracy of the model is very low in comparison.
- Therefore, data augmentation is very essential to help improve the generalization ability of the model.
7. Dropout
- Dropout also has a strong negative correlation.
- Higher values of dropout like 0.5 hamper the performance of the model.
- Dropout of 0.2 works better than dropout of 0.5
8. Number of Epochs
- Number of epochs is positively correlated with validation accuracy.
- It was observed that models trained for 30 epochs significantly outperform the models trained for 20 epochs.
- In general it was observed that these models converge slowly, therefore training for more epochs is beneficial.
Question 4 (5 Marks)
You will now apply your best model on the test data (You shouldn't have used test data so far. All the above experiments should have been done using train and val data only).
(a) Use the best model from your sweep and report the accuracy on the test set.
(b) Provide a 10 x 3 grid containing sample images from the test data and predictions made by your best model (more marks for presenting this grid creatively).
(c) Visualise all the filters in the first layer of your best model for a random image from the test set. If there are 64 filters in the first layer plot them in an 8 x 8 grid.
Answer :-
(a)
The best model from our sweep has the following configuration :-
-
num_epochs : 20
-
num_filters : [32,64,128,256,512]
-
filter_size : [(2,2),(3,3),(5,5),(7,7),(7,7)]
-
pool_size : [(2,2),(2,2),(2,2),(2,2),(2,2)]
-
learning_rate: [1e-4]
-
batch_norm: False
-
dense_neurons: [512]
-
augmentation: [True]
-
dropout: [0]
-
train_accuracy = 456.9
-
val_accuracy = 40.24
-
test_accuracy = 43.50
Since we applied 32 filters at first CONV layer, therefore the above figure is arranged in (4x8)(4x8). Each featuremap is (224,224)(224,224). Explaination of some of the above images after applying the filters at first CONV layer:
- filter[0,0][0,0]: has all the neurons dead.
- filter[0,1][0,1]: filter extracts the actual creature.
- filter[2,2][ 2,2]: filter extracts the background details.
Question 5 (10 Marks)
Apply guided back propagation on any 10 neurons in the CONV5 layer and plot the images which excite this neuron. The idea again is to discover interesting patterns which excite some neurons. You will draw a 10 x 1 grid below with one image for each of the 10 neurons.
Answer :-
The actual image is from 'Arachnida' class. It is resized to 224x224224 x 224. Fifth CONV layer has dimensions (14,14,512)(14,14,512). Run the colab code present in github 'guided_backpropagation.py' code to visualize the same output as shown above.
Explaining some of the guided backpropagated images:
- 1st image: Fixed neuron position is (7,7,356)(7,7,356) in fifth CONV layer. Since the position of the fixed neuron is almost in the center of the feature map therefore it is affected by most of the neurons in the center and since the spider is in the center it captures the details about of spider.
- 2nd image: Fixed neuron position is (10,8,356)(10,8,356) in fifth CONV layer. Since the position of the fixed neuron is almost in the little bottom of the feature map therefore it is affected by most of the neurons in that area and since the spider's head is in the bottom part it captures the details about of spider's head.
Question 6 (10 Marks)
Paste a link to your github code for Part A
Example: https://github.com/<user-id>/cs6910_assignment2/partA;
-
We will check for coding style, clarity in using functions and a README file with clear instructions on training and evaluating the model (the 10 marks will be based on this).
-
We will also run a plagiarism check to ensure that the code is not copied (0 marks in the assignment if we find that the code is plagiarised).
-
We will check the number of commits made by the two team members and then give marks accordingly. For example, if we see 70% of the commits were made by one team member then that member will get more marks in the assignment (note that this contribution will decide the marks split for the entire assignment and not just this question).
-
We will also check if the training and test data has been split properly and randomly. You will get 0 marks on the assignment if we find any cheating (e.g., adding test data to training data) to get higher accuracy.
Answer :-
Github Link for Part A :-
https://github.com/PranjalChitale/CS6910_Assignment2/tree/main/part_a
Part B : Fine-tuning a pre-trained model
Question 1 (5 Marks)
In most DL applications, instead of training a model from scratch, you would use a model pre-trained on a similar/related task/dataset. From keras, you can load any model (InceptionV3, InceptionResNetV2, ResNet50, Xception, etc) pre-trained on the ImageNet dataset. Given that ImageNet also contains many animal images, it stands to reason that using a model pre-trained on ImageNet maybe helpful for this task.
You will load a pre-trained model and then fine-tune it using the naturalist data that you used in the previous question. Simply put, instead of randomly initialising the weigths of a network you will use the weights resulting from training the model on the ImageNet data (keras directly provides these weights). Please answer the following questions:
(a) The dimensions of the images in your data may not be the same as that in the ImageNet data. How will you address this?
(b) ImageNet has 1000 classes and hence the last layer of the pre-trained model would have 1000 nodes. However, the naturalist dataset has only 10 classes. How will you address this?
Your implementation should be modular so that it allows to swap in any model (InceptionV3, InceptionResNetV2, ResNet50, Xception).
(Note: This question is only to check the implementation. The subsequent questions will talk about how exactly you will do the fine-tuning)
Answer :-
a) We resized the images to required dimensions, required by each model using flow_from_directory and ImageDataGenerator functions from tensorflow.
b) We used only the feature extraction part of each pre-trained models and appended our own fully connected layers at the end to map the output to 10 classes.
Question 2 (5 Marks)
You will notice that InceptionV3, InceptionResNetV2, ResNet50, Xception are very huge models as compared to the simple model that you implemented in Part A. Even fine-tuning on a small training data may be very expensive. What is a common trick used to keep the training tractable (you will have to read up a bit on this)? Try different variants of this trick and fine-tune the model using the iNaturalist dataset. For example, '___'ing all layers except the last layer, '___'ing upto k layers and '___'ing the rest. Read up on pre-training and fine-tuning to understand what exactly these terms mean.
Write down the different strategies that you tried (simple bullet points would be fine).
Answer :-
Freezing all the layers except the last layers OR training upto k layers and freezing the rest.
To reduce the computation time to fine tune the model, We had pre-trained the whole model first by not training the pre-trained model's layer initially and training only the fully connected layers. Once the fully connected layers are pre-trained we starts to unfreeze few of the pre-trained model's layers and starts to train them too (i. e. fine tune the whole model). We pretrained the whole model initally because if we start fine tuning without pre-training then the fully connected layers will have random weights and therefore the loss will be too high and therefore, the gradient will be too high and which changes the weights of pre-trained model by too much which will destroy the use of transfer learning. Key point to note during fine tuning is to set the low learning rate as the pre-trained weights are not altered by large value.
Also to fine-tune, its best to not train the whole model but to start with few of the top layers of the pre-trained model first. As the lower layers capture the very simple details (like lines, curve, etc) which are present in almost every dataset. In most convolution layers, the higher layers are the one which are more specialized to the given dataset.
Question 3 (15 Marks)
Now finetune the model using different strategies that you discussed above and different hyperparameter choices. Based on these experiments write down some insightful inferences (once again you will find the sweep function to be useful to plot and compare different choices).
Here are some examples of inferences that you can draw:
- Using a huge pre-trained network works better than training a smaller network from scratch (as you did in Part A)
- InceptionV3 works better for this task than ResNet50
- Using a pre-trained model, leads to faster convergence as opposed to training a model from scratch
- ... ....
(Note: I don't know if any of the above statements is true. I just wrote some random comments that came to my mind) Of course, provide evidence (in the form of plots) for each inference.
Of course, provide appropriate plots for the above inferences (mostly automatically generated by wandb). The more insightful and thorough your inferences and the better the supporting evidence (in terms of plots), the more you will score in this question.
Answer :-
- Using the pre-trained network is far better than training a smaller network from scratch. As the pre-trained network gives much better performance and converges really fast. The best pre-trained network gave .83 validation accuracy with just 9 epoch whereas the best smaller network model gave 0.4 validation accuracy with 20 epoch.
- Using the pre-trained network (as feature extraction) with fine tuning is the best way to train any general CNN model rather than training a small model from scratch.
- Xception and InceptionResNetV2 gave best results.
- ResNet50 performed worst among pre-trained models.
- Fine-tuning the few layers of pre-trained network after pre-training the whole model is better than fine tuning the whole model at once. As most of the initial layers capture very minute details (like curve and lines) which are present in almost every dataset.
- The top layers of feature-extraction models capture dataset specific details and if we train especially them, then we can achieve our goal of higher accuracy with fewer training parameters.
- The fully connected size plays an important role as most of the parameters are contributed by these layers. Higher the layer size, higher the parameters.
- Increasing the Batch size will reduce the training and fine tuning time.
- The pre-trained model reaches .70 validatoin accuracy in 4-5 epochs and they can be capped at 10 epochs.
Question 4 (10 Marks)
Paste a link to your github code for Part A
Example: https://github.com/<user-id>/cs6910_assignment2/partB
Follow the same instructions as in Question 6 of Part A.
Answer :-
Github Link for Part B :-
https://github.com/PranjalChitale/CS6910_Assignment2/tree/main/part_b
Part C : Using a pre-trained model as it is
Question 1 (15 Marks)
Object detection is the task of identifying objects (such as cars, trees, people, animals) in images. Over the past 6 years, there has been tremendous progress in object detection with very fast and accurate models available today. In this question you will use a pre-trained YoloV3 model and use it in an application of your choice. Here is a cool demo of YoloV2 (click on the image to see the demo on youtube).
Go crazy and think of a cool application in which you can use object detection (alerting lab mates of monkeys loitering outside the lab, detecting cycles in the CRC corridor, ....). More marks if you come up with an application which has social relevance.
Make a similar demo video of your application, upload it on youtube and paste a link below (similar to the demo I have pasted above).
Also note that I do not expect you to train any model here but just use an existing model as it is. However, if you want to fine-tune the model on some application-specific data then you are free to do that (it is entirely up to you).
Notice that for this question I am not asking you to provide a github link to your code. I am giving you a free hand to take existing code and tweak it for your application. Feel free to paste the link of your code here nonetheless (if you want).
Example: https://github.com/<user-id>/cs6910_assignment2/partC
Answer :-
One of the major issues faced by citizens in India during road travel is Potholes.
According to a report released by the Ministry of Roads, Government of India :-
"In 2017, 3,600 deaths were recorded due to potholes across the country, taking almost 10 lives daily."
Potholes pose a serious threat to citizens and are one of the main cause of road accidents.
The problem of fixing potholes is tricky, as a pothole can be fixed only if the pothole is reported to the civic authorities.
However, all potholes don't get reported to the authority and remain as they are and can cause accidents.
Particularly, the Municipal Corporation of Greater Mumbai, has launched an app on Google Play Store , wherein citizens can click a photo and report the potholes along with GPS location and then the authorities fix those.
Though this is a commendable step, this is still doesn't seem a practical permanent solution for this problem as it relies on manual reports rather than automation, and it cannot be expected from citizens to take the time out to click photos and report potholes particularly considering the traffic conditions.
Therefore, we need some solution that can be fitted on vehicles which can automatically detect potholes in the video feed and directly send a report to the authorities along with GPS coordinates.
This will not only save time, citizen's efforts but also automate the process and improve the efficiency as potholes in the video stream will be directly captured by the system, detected and reported to the authorities.
Such a system will only just contain a camera module fitted on a vehicle, paired with some Android app.
Such a system can easily be deployed on vehicles like Garbage trucks, Police surveillance vehicles which usually visit various parts of the city everyday.
An essential module of this would be a pothole detector, which is the application we have selected to develop a demo video.
We have used the pre-trained weights of YoloV4 by Alexey Bochkovskiy et al.
As, the class of pothole was not present in the original model,following the tutorial provided on the above webpage, we fine-tuned the model on a publicly available pothole dataset which contains around 1.2K pothole images annotated as per the YOLO labeling format.
Lets' see the model in action, on a demo video. The source of the original video is Times of India and India Today.
We have trimmed the above videos and selected few portions from both videos to create a combined video which was passed as an input to the pothole detection model.
Clideo Tool was used to compress the video as the size was too large.
Here, is the demo video depicting Potholes Detected by the Model :-
[1] https://github.com/AlexeyAB/darknet
[2] https://drive.google.com/file/d/1CsT0vUHu8_CO80KnquP_jBgSM3EXznRp/view
Self Declaration
List down the contributions of the two team members:
For example,
CS21S022: (50% contribution)
- Part A (Except Q1, Q5)
- Part C
CS21M050: (50% contribution)
- Part A (Q1, Q5)
- Part B
We, Pranjal Agadh Chitale and Ravindra Kumar Vaishya, swear on our honour that the above declaration is correct.
Note: Your marks in the assignment will be in proportion to the above declaration. Honesty will be rewarded (Help is always given in CS6910 to those who deserve it!).
This is an opportunity for you to come clean. If one of the team members has not contributed then it should come out clearly from the above declaration. There will be a viva after the submission. If your performance in the viva does not match the declaration above then both the team members will be penalised (50% of the marks earned in the assignment).