CS6910 Assignment-2
Part A
Question - 1
Let's assume the size of the image is H×W×3H \times W \times 3. We have a stride of 11 for convolution layers and they are of size k×kk \times k. MaxPooling kernels have a size of FF and a stride of FF. There no batch-normalization involved. Hence the computations arise from multiplications, additions, ReLU comparison and maxpooling comparisons.
(a) Total number of computations by the model
-
First Layer: Input dim - H×W×3H \times W \times 3
- Conv with ReLU: (W−k+1)⋅(H−k+1)⋅m⋅[3k2+(3k2−1)+1+1]=(W−k+1)⋅(H−k+1)⋅m⋅[6k2+1](W - k + 1) \cdot (H - k + 1) \cdot m \cdot [3k^2 + (3k^2 - 1) + 1 + 1] = (W - k + 1) \cdot (H - k + 1) \cdot m \cdot [6k^2 + 1] (for each entry in the new feature map, we have 3k23k^2 multiplications, 3k2−13k^2 - 1 additions, 11 bias addition and 11 ReLU comparison)
- MaxPooling: Let W′=(W−k+1)W' = (W - k + 1) and H′=(H−k+1)H' = (H - k + 1). After max-pooling the dimensions would be W1=⌊W′F⌋W_1 = \left\lfloor\dfrac{W'}{F}\right\rfloor and H1=⌊H′F⌋H_1 = \left\lfloor\dfrac{H'}{F}\right\rfloor. ∴\therefore the number of computations are W1⋅H1⋅m⋅3k2⋅F2W_1 \cdot H_1 \cdot m \cdot 3k^2 \cdot F^2 (here, we assume that to find the maximum among nn elements, we need nn comparisons)
- Total: ((W−k+1)⋅(H−k+1)⋅m⋅(6k2+1))+(H1⋅W1⋅m⋅3k2⋅F2)((W - k + 1) \cdot (H - k + 1) \cdot m \cdot (6k^2 + 1)) + (H_1 \cdot W_1 \cdot m \cdot 3k^2 \cdot F^2)
-
Second Layer: Input dim - H1×W1×mH_1 \times W_1 \times m:
- Conv with ReLU: (W1−k+1)⋅(H1−k+1)⋅m⋅[mk2+(mk2−1)+1+1]=(W1−k+1)⋅(H1−k+1)⋅m⋅[2mk2+1](W_1 - k + 1) \cdot (H_1 - k + 1) \cdot m \cdot [mk^2 + (mk^2 - 1) + 1 + 1] = (W_1 - k + 1) \cdot (H_1 - k + 1) \cdot m \cdot [2mk^2 + 1] (for each entry in the new feature map, we have mk2mk^2 multiplications, mk2−1mk^2 - 1 additions, 11 bias addition and 11 ReLU comparison)
- MaxPooling: Let W1′=(W1−k+1)W_1' = (W_1 - k + 1) and H1′=(H1−k+1)H_1' = (H_1 - k + 1). After max-pooling the dimensions would be W2=⌊W1′F⌋W_2 = \left\lfloor\dfrac{W_1'}{F}\right\rfloor and H2=⌊H1′F⌋H_2 = \left\lfloor\dfrac{H_1'}{F}\right\rfloor. ∴\therefore the number of computations are W2⋅H2⋅m⋅mk2⋅F2W_2 \cdot H_2 \cdot m \cdot mk^2 \cdot F^2 (here, we assume that to find the maximum among nn elements, we need nn comparisons)
- Total: ((W1−k+1)⋅(H1−k+1)⋅m⋅(2mk2+1))+(H2⋅W2⋅m⋅mk2⋅F2)((W_1 - k + 1) \cdot (H_1 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_2 \cdot W_2 \cdot m \cdot mk^2 \cdot F^2)
-
Third Layer: Input dim - H2×W2×mH_2 \times W_2 \times m
- Conv with ReLU: (W2−k+1)⋅(H2−k+1)⋅m⋅[mk2+(mk2−1)+1+1]=(W2−k+1)⋅(H2−k+1)⋅m⋅[2mk2+1](W_2 - k + 1) \cdot (H_2 - k + 1) \cdot m \cdot [mk^2 + (mk^2 - 1) + 1 + 1] = (W_2 - k + 1) \cdot (H_2 - k + 1) \cdot m \cdot [2mk^2 + 1] (for each entry in the new feature map, we have mk2mk^2 multiplications, mk2−1mk^2 - 1 additions, 11 bias addition and 11 ReLU comparison)
- MaxPooling: Let W2′=(W2−k+1)W_2' = (W_2 - k + 1) and H2′=(H2−k+1)H_2' = (H_2 - k + 1). After max-pooling the dimensions would be W3=⌊W2′F⌋W_3 = \left\lfloor\dfrac{W_2'}{F}\right\rfloor and H3=⌊H2′F⌋H_3 = \left\lfloor\dfrac{H_2'}{F}\right\rfloor. ∴\therefore the number of computations are W3⋅H3⋅m⋅mk2⋅F2W_3 \cdot H_3 \cdot m \cdot mk^2 \cdot F^2 (here, we assume that to find the maximum among nn elements, we need nn comparisons)
- Total: ((W2−k+1)⋅(H2−k+1)⋅m⋅(2mk2+1))+(H3⋅W3⋅m⋅mk2⋅F2)((W_2 - k + 1) \cdot (H_2 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_3 \cdot W_3 \cdot m \cdot mk^2 \cdot F^2)
-
Fourth Layer: Input dim - H3×W3×mH_3 \times W_3 \times m
- Conv with ReLU: (W3−k+1)⋅(H3−k+1)⋅m⋅[mk2+(mk2−1)+1+1]=(W3−k+1)⋅(H3−k+1)⋅m⋅[2mk2+1](W_3 - k + 1) \cdot (H_3 - k + 1) \cdot m \cdot [mk^2 + (mk^2 - 1) + 1 + 1] = (W_3 - k + 1) \cdot (H_3 - k + 1) \cdot m \cdot [2mk^2 + 1] (for each entry in the new feature map, we have mk2mk^2 multiplications, mk2−1mk^2 - 1 additions, 11 bias addition and 11 ReLU comparison)
- MaxPooling: Let W3′=(W3−k+1)W_3' = (W_3 - k + 1) and H3′=(H3−k+1)H_3' = (H_3 - k + 1). After max-pooling the dimensions would be W4=⌊W3′F⌋W_4 = \left\lfloor\dfrac{W_3'}{F}\right\rfloor and H4=⌊H3′F⌋H_4 = \left\lfloor\dfrac{H_3'}{F}\right\rfloor. ∴\therefore the number of computations are W4⋅H4⋅m⋅mk2⋅F2W_4 \cdot H_4 \cdot m \cdot mk^2 \cdot F^2 (here, we assume that to find the maximum among nn elements, we need nn comparisons)
- Total: ((W3−k+1)⋅(H3−k+1)⋅m⋅(2mk2+1))+(H4⋅W4⋅m⋅mk2⋅F2)((W_3 - k + 1) \cdot (H_3 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_4 \cdot W_4 \cdot m \cdot mk^2 \cdot F^2)
-
Fifth layer: Input dim - H4×W4×mH_4 \times W_4 \times m
- Conv with ReLU: (W4−k+1)⋅(H4−k+1)⋅m⋅[mk2+(mk2−1)+1+1]=(W4−k+1)⋅(H4−k+1)⋅m⋅[2mk2+1](W_4 - k + 1) \cdot (H_4 - k + 1) \cdot m \cdot [mk^2 + (mk^2 - 1) + 1 + 1] = (W_4 - k + 1) \cdot (H_4 - k + 1) \cdot m \cdot [2mk^2 + 1] (for each entry in the new feature map, we have mk2mk^2 multiplications, mk2−1mk^2 - 1 additions, 11 bias addition and 11 ReLU comparison)
- MaxPooling: Let W4′=(W4−k+1)W_4' = (W_4 - k + 1) and H4′=(H4−k+1)H_4' = (H_4 - k + 1). After max-pooling the dimensions would be W5=⌊W4′F⌋W_5 = \left\lfloor\dfrac{W_4'}{F}\right\rfloor and H5=⌊H4′F⌋H_5 = \left\lfloor\dfrac{H_4'}{F}\right\rfloor. ∴\therefore the number of computations are W5⋅H5⋅m⋅mk2⋅F2W_5 \cdot H_5 \cdot m \cdot mk^2 \cdot F^2 (here, we assume that to find the maximum among nn elements, we need nn comparisons)
- Total: ((W4−k+1)⋅(H4−k+1)⋅m⋅(2mk2+1))+(H5⋅W5⋅m⋅mk2⋅F2)((W_4 - k + 1) \cdot (H_4 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_5 \cdot W_5\cdot m \cdot mk^2 \cdot F^2)
-
Fully Connected layer: Input dim - H5⋅W5⋅mH_5 \cdot W_5 \cdot m
- Multiplications: n⋅(H5⋅W5⋅m)2n \cdot (H_5 \cdot W_5 \cdot m)^2 (row multiplied by column, like that we do for nn such rows)
- Additions: n⋅((H5⋅W5⋅m)2−1)n \cdot ((H_5 \cdot W_5 \cdot m)^2 - 1)
- Total: 2n⋅(H5⋅W5⋅m)2−n2n \cdot (H_5 \cdot W_5 \cdot m)^2 - n
-
Output layer: Input dim - nn
- Multiplications: 10⋅n210 \cdot n^2
- Additions: 10⋅(n2−1)10 \cdot (n^2 - 1)
- Total: 20n2−1020n^2 - 10
The total computation will be a sum of all these individual computations of each layer, i.e. ((W−k+1)⋅(H−k+1)⋅m⋅(6k2+1))+(H1⋅W1⋅m⋅3k2⋅F2)+((W1−k+1)⋅(H1−k+1)⋅m⋅(2mk2+1))+(H2⋅W2⋅m⋅mk2⋅F2)+((W2−k+1)⋅(H2−k+1)⋅m⋅(2mk2+1))+(H3⋅W3⋅m⋅mk2⋅F2)+((W2−k+1)⋅(H2−k+1)⋅m⋅(2mk2+1))+(H3⋅W3⋅m⋅mk2⋅F2)+((W3−k+1)⋅(H3−k+1)⋅m⋅(2mk2+1))+(H4⋅W4⋅m⋅mk2⋅F2)+((W4−k+1)⋅(H4−k+1)⋅m⋅(2mk2+1))+(H5⋅W5⋅m⋅mk2⋅F2)+(2n⋅(H5⋅W5⋅m)2)−n+20n2−10((W - k + 1) \cdot (H - k + 1) \cdot m \cdot (6k^2 + 1)) + (H_1 \cdot W_1 \cdot m \cdot 3k^2 \cdot F^2) + ((W_1 - k + 1) \cdot (H_1 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_2 \cdot W_2 \cdot m \cdot mk^2 \cdot F^2) + ((W_2 - k + 1) \cdot (H_2 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_3 \cdot W_3 \cdot m \cdot mk^2 \cdot F^2) + ((W_2 - k + 1) \cdot (H_2 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_3 \cdot W_3 \cdot m \cdot mk^2 \cdot F^2) + ((W_3 - k + 1) \cdot (H_3 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_4 \cdot W_4 \cdot m \cdot mk^2 \cdot F^2) + ((W_4 - k + 1) \cdot (H_4 - k + 1) \cdot m \cdot (2mk^2 + 1)) + (H_5 \cdot W_5\cdot m \cdot mk^2 \cdot F^2) + (2n \cdot (H_5 \cdot W_5 \cdot m)^2) - n + 20n^2 - 10
(b) Total number of parameters of the model
- First Layer: (k⋅k⋅3⋅m)+m=m⋅(3k2+1)(k \cdot k \cdot 3 \cdot m) + m = m \cdot (3k^2 + 1)
- Second Layer: (k⋅k⋅m⋅m)+m=m⋅(mk2+1)(k \cdot k \cdot m \cdot m) + m = m \cdot (mk^2 + 1)
- Third Layer: (k⋅k⋅m⋅m)+m=m⋅(mk2+1)(k \cdot k \cdot m \cdot m) + m = m \cdot (mk^2 + 1)
- Fourth Layer: (k⋅k⋅m⋅m)+m=m⋅(mk2+1)(k \cdot k \cdot m \cdot m) + m = m \cdot (mk^2 + 1)
- Fifth layer: (k⋅k⋅m⋅m)+m=m⋅(mk2+1)(k \cdot k \cdot m \cdot m) + m = m \cdot (mk^2 + 1)
- Fully Connected layer: (H5⋅W5⋅m)⋅n+n(H_5 \cdot W_5 \cdot m) \cdot n + n
- Output layer: 10n+1010n + 10
The total number of parameters will be a sum of all these individual parameters of each layer, i.e. (m⋅(3k2+1))+(4m⋅(3k2+1))+((H5⋅W5⋅m)⋅n+n)+(10n+n)(m \cdot (3k^2 + 1)) + (4m \cdot (3k^2 + 1)) + ((H_5 \cdot W_5 \cdot m) \cdot n + n) + (10n + n)
Question-2
We used Bayes Strategy\texttt{Bayes Strategy} to converge to the best hyperparameter configure quickly. We also used EarlyStopping\texttt{EarlyStopping} to terminate epochs which were not performing to well and also to curb overfitting. In this process we observed that the model would overfit it it ran for more than 25 epochs and hence, we fixed the epochs to 25 for all runs. The model is configured so that it stores the parameters of the time when it obtains the best validation accuracy during the epoch (this is given as best validation accuracy in the plots)
HYPERPARAMETER VALUES TESTED:
sweep_config = {
'method': 'bayes',
'metric': {
'name': 'val_accuracy',
'goal': 'maximize'
},
'parameters': {
'lr': {
'values': [1e-3, 1e-4, 7e-5]
},
'batch_size': {
'values': [16, 32, 64]
},
'num_dense': {
'values': [64, 128, 256, 512]
},
'dense_activation': {
'values': ['selu', 'softplus', 'relu', 'leaky_relu'],
},
'optimizer':{
'values': ['sgd', 'adamax', 'rmsprop', 'adam']
},
'batch_normalization': {
'values': [True, False]
},
'dropout': {
'values': [0.1, 0.2, 0.3]
},
'data_augmentation': {
'values': [True, False]
},
'num_filters': {
'values': [16, 32, 64],
},
'filter_multiplier': {
'values': [0.5, 1, 2],
},
'filter_size': {
'values': [3, 5],
}
}
}
Question - 3
OBSERVATIONS:
- Lower learning rates worked well for the model. We reason this maybe due to the very uneven error surface because of the very complex and deep neural network with significant non-linearity. The error surface mostly has a lot of local minima and higher learning rate it getting stuck and "bouncing" around in the local minima. With learning rate 0.01, the model does not even cross 10% accuracy, which means it is as good as randomly guessing.
- In the limited number of epochs, i.e. 25, Adamax performed better than Adam and Adam performed better than SGD. SGD converged but it was very slow compared to the remaining optimzers. Since we capped the number of epochs to 25, the training with SGD was limited. But there has been a study which showed that SGD/NAG with enough steps can generalize much better than faster optimizers. We do observe this in our experiments that the better generalization (lower difference between train accuracy and validation accuracy) is achieved with SGD but Adamax does overfit significantly.
- Performing Data Augmentation worsened the learning as the the images were already kind-of "pre-augmented", i.e. had different sizes, zoom levels, objects at different angles and positions in the image and most of the creatures in the images were camouflaged (ex. scorpion in sand, a green insect between plants, small fungus in the bushes).
- Activation at the dense layer did not matter much. Any of the ReLU family activations performed similarly. But changing the activation at the convolution layer (which was tested independently with cmdline arguments) to newer functions for CV applications like swish (or even selu/elu for that sake) boosted the average accuracy by 2-3%.
- Due to the complexity of the problem, a larger and deeper network was required. Hence, filter_multiplier was chosen as 2 in most cases. The network where the number of filters decreased in each layer had a limited performance. As seen from the importance plot, filter_mulitplier (doubling/halving/same) has a positive correlation to the performance.
- We did not use dropout in the convolution layer as we already added batchnormalization there. High dropout in the dense layer constrained the model while a low dropout caused over-fitting and "medium" level of dropout, 0.2 (not high, not low) was chosen as the best value.
- Batch size has a positive correlation to the performance. Batch sizes like 32 was preferred to lower ones like 16. Though a larger batch size is computationally expensive, it is a better approximation of the true gradient, i.e. gradient w.r.t all data points.
- Very large dense layer is prone to over-fitting. If the dense layer was made large, at one point, the number of parameters required by it overshot the number of parameters in the convolution layers. Hence, we limited its search to 512, but 256 was preferred in most cases which is "medium" (i.e. not so large, not so small)
- Batch normalization in our case had a negative correlation to the performance. Batchnormalization works well when we have batches of very large size, say 128, 256 and 1024 while we had low batch size like 32. Normalizing the values across a batch statistically makes sense (more accurate) when we we have more samples. Then we can normalize them with the "true mean" and "true variance", rather than the "sample mean" and "sample variance" (as with lower samples), which may lead to incorrect results.
- "Medium" size filters of size 5 were optimal compared to smaller ones like 3 and slightly larger ones like 7. Larger filters (tested independently once again) capture more area in an image, which may include a significant amount of background along with the object. Given that most of the creatures in the images were camouflaged, if viewed in a larger background, it will be difficult to identify the creature (as it gets blended in the environment). Whereas, smaller filters were not be able to capture the whole object completely and make an informed prediction.
Question - 4
a) BEST HYPERPARAMETERS and MODEL METRICS:
NUM_FILTERS = 32
CONV_FILTER_SIZE = 5
FILTER_MULT = 2
DATA_AUGMENTATION = False
DROPOUT = 0.2
BATCHNORM = False
OPTIM_NAME = "adamax"
LR = 7e-5
BATCH_SIZE = 32
DENSE_UNITS = 256
DENSE_ACTIVATION = "relu"
1. Train - loss: 0.4980, accuracy: 0.6994 (no validation split is made while evaluating)
2. Test - loss: 2.2633, accuracy: 0.3875
b) PREDICTIONS OF OUR MODEL:

c) FILTERS AND FEATURE-MAPS FOR A GIVEN IMAGE:



In the feature-maps we can observe many edge-detectors (ex. feature-map-32, feature-map-25). Feature-map-8, Feature-map-22, Feature-map-11 and Feature-map-19 focuse on the fungus in the grass, but a few of them focus on the grass around (ex. feature-map-27, feature-map-19)
Question-5


Question-6
Part-B
Question-1
(a) The dimensions of the images in your data may not be the same as that in the ImageNet data. How will you address this?
We resized the images to the size of the ImageNet data which is (224, 224, 3) using the ImageDataGenerator().flow_from_directory()\texttt{ImageDataGenerator().flow\_from\_directory()} method while creating the data generators itself. Specifically, we interpolated using the nearest\texttt{nearest} strategy.
(b) ImageNet has 1000 classes and hence the last layer of the pre-trained model would have 1000 nodes. However, the naturalist dataset has only 10 classes. How will you address this?
In the base_model constructor, we set the argument include_top = False\texttt{include\_top = False}. This removes the final 1024 output units from the base_model, and we only get convolution layers. After that we added a GlobalAveragePooling\texttt{GlobalAveragePooling} layer to apply average pooling and flatten the convolution output. Then after passing that flattened output through a Dense\texttt{Dense} layer, we added a final output layer with 10 units and Softmax\texttt{Softmax} activation to predict one out of the 10 classes of naturalist dataset.
Question-2
PRE-TRAINING VS FINE_TUNING:
- Pretraining or Transfer Learning is the process of using an already trained model (also called the base_model) with frozen weights as a feature extractor to train a model on a different dataset than the one which the base_model was trained on.
- Fine tuning is the process of not using the base_model only as a feature extractor but also unfreezing some of it's weights to adjust better to the given dataset. This is called Fine tuning because it is performed after the transfer learning step is finished to reduce the computation cost.
STRATEGIES USED:
- Freezing None of the layers of the base_model and training the whole model (includes the dense layers as well)
- Freezing 33% of the layers of the base_model and training the rest (includes the dense layers as well)
- Freezing 66% of the layers of the base_model and training the rest (includes the dense layers as well)
- Freezing all layers of the base_model and just training the top dense layers.
Question-3
a) OBSERVATIONS:
- For this part, we experimented with a new optimizer called RectifiedAdam with LookAhead (a.k.a RANGER). Since, the model we built is very deep and complex, we tried to add as much regularization to it as possible using Decoupled Weight Decay (mentioned as weight_decay in the optimizer), Data Augmentation, Dropout and Swish activation at the dense layer. For all the runs, we trained the model for exactly 5 epochs (with EarlyStopping\texttt{EarlyStopping}). These models were so powerful that they could generalize so well (~80% validation accuracy) within such limited steps, whereas the CNN used in part-A never crossed 35% validation accuracy with the same number of epochs.
- Lower learning rates were always preferred for better training. With higher learning rate, the model gets stuck. This may be because of the highly uneven error surface with multiple local minima. Higher learning rate causes the model to bounce around in a local minima. As in the parameter importance plot as well, learning rate has the maximum negative correlation to the validation accuracy and the lower it is, the better the training.
- If Data-Augmentation is being used, lower weight decay values provide better results. Since the model is already so constrained with data-augmentation, dropout, swish regularization and internal batch-normalization, adding more regularization will penalize the model too much and hinder the training.
- Due to the model complexities, the batch size was limited to 64. Anything above, could not be trained due to lack of memory. Batch size has a positive correlation to and larger batch size captures more diversity in the input allowing the model to generalize well. But a higher batch size does increase the runtime as well.
- Models with all of the layers frozen (only final dense layer being trainable) resulted in the best performance in all models (especially Xception). This may be because the imagenet weights were appropriate enough for the naturalist dataset. We do some overlap between the two datasets in terms of the classes (like plants, animals etc.) This kind of training was the cheapest in terms of memory/GPU usage and computations/runtime.
- Fully training the models (none of the layers are frozen) also resulted in good performance (but not as good as all layers being frozen). This may be due to the fact that the training allowed the slight adjustments in the weights and improved the performance in some cases, but since the weights are almost appropriate, disturbing them decreased the overall performance. But this strategy does increase the runtime significantly.
- Resnet50 performed the worst in this setting. Maybe increasing the layers, like using Resnet101/Resnet152 with good regularization may give better results. Probably simple conv-batchnorm-relu-maxpool blocks are not sufficient.
- For all the model, the more we train, the better results we get (provided enough regularization is added. These models are large enough to overfit within couple of epochs). Validation accuracy was proportional to training time.
- The new optimizer, RANGER, produced faster convergence/training for all models.
- InceptionResNetV2 consistently produced more than 75% validation accuracy with lower learning rates. This is because it has the best of both worlds of, i.e. multiple filters and residual connections.
- Partially freezing any network in all cases produced "medium" results, i.e. between fully frozen and none being frozen. Lesser number of layers frozen gave better results (as freeze has negative correlation). Partially freezing the network allows for the model to accommodate changes without completely disturbing the preset weights (which are good for naturalist dataset). But nevertheless, some the weights will be altered during training and the performance is a bit lower.
- Regularization during training was the most import aspect with the pre-trained model. Without that, significant over-fitting was observed. A major contribution for this was also the the large dense layer (1024 units) added before the output.
b) BEST HYPERPARAMETERS and MODEL METRICS:
BASE_MODEL = InceptionResNetV2
BATCH_SIZE = 64
FREEZE = 0.66
LEARNING_RATE = 0.0001
WEIGHT_DECAY = 0.0001
DATA_AUGMENTATION = False
loss: 0.6721 - accuracy: 0.8100
Question-4
Part-C
Question-1
Application: Wildlife Monitoring\textbf{Application: Wildlife Monitoring}
1. Motivation:\textbf{1. Motivation:}
- IIT-Madras has various animals which roam around freely !!! But recent forest department report suggest that 77 animals died on IIT-Madras campus last year and another 18 died in January and February this year.
- According to sources, four deer were found dead on IIT-Madras campus between March 16 and 17.
The forest department seems to not have an answer to the back-to-back animal deaths. The latest death of a monkey has become a cause of worry as monkeys come in close contact with human beings on the campus.
2. Why Wildlife Monitoring ?\textbf{2. Why Wildlife Monitoring ?}
-
The monitoring of wildlife includes keeping track of animal movements, studying the population distribution of wildlife, the condition of natural habitats, and the identification of possible threats to different wildlife species, including poaching.
-
Such information promotes a better understanding of the status of different wildlife species and also contributes toward the better management of wildlife. Data collected through wildlife monitoring help us make sure that common species remain common and that rare, threatened, and endangered species receive continued protection and assistance. The animals detected can be recorded along with the location in the forest they are found in and their count (ex. how many tigers were spotted during the day/night, how many times an elephant was detected in a particular area)
-
This it give us a better understanding of the occurrence, distribution and status of wildlife in the preserves.
-
Effective monitoring which allows changes in population status to be detected early provides opportunities to mitigate pressures driving declines.
3. Proposal :\textbf{3. Proposal :}
In recent years, the development of modern monitoring devices such as camera traps, collaring devices, and conservation drones has contributed for wildlife monitoring and management. So integrating the deep learning technology with these devices help us to collect the useful data quickly and monitor effectively.
We used the a YoloV5 model which is trained on the COCO dataset to create the simple demo of wildlife monitoring.
Video : Wildlife Monitoring
Code : GitHub Link for Part-C
Self-Declaration
Varun Gumma (CS21M070) - 100% contribution:\textbf{Varun Gumma (CS21M070) - 100\% contribution:}
- Solved the number of parameters and computations question.
- Designed the model for Part-A and wrote the code for cmdline training and wandb sweeps.
- Analysed the performance of the CNN model and wrote down inferences/observations.
- Generated all required plots for Part-A training.
- Coded guided backprop and filter/feature-map visualization and analysed the results.
- Read up about about new pre-trained models, optimizers and activations for part-B.
- Pre-trained/Fine-tuned models for part-B using cmdline and wandb sweeps.
- Analylsed the performances of pre-trained models and wrote down inferences/observations.
- Generated all required plots for Part-B training.
- Read up about YOLO and its applications.
- Came up with an idea of social relevance for Part-C.
Hanumantappa Budihal (CS21M022) - 100% contribution:\textbf{Hanumantappa Budihal (CS21M022) - 100\% contribution:}
- Solved the number of parameters and computations question.
- Designed the model for Part-A and wrote the code for cmdline training and wandb sweeps.
- Analysed the performance of the CNN model and wrote down inferences/observations.
- Generated all required plots for Part-A training.
- Coded guided backprop and filter/feature-map visualization and analysed the results.
- Read up about about new pre-trained models, optimizers and activations for part-B.
- Pre-trained/Fine-tuned models for part-B using cmdline and wandb sweeps.
- Analylsed the performances of pre-trained models and wrote down inferences/observations.
- Generated all required plots for Part-B training.
- Read up about YOLO and its applications.
- Came up with an idea of social relevance for Part-C.