Skip to main content

Classification of domestic environments

A project to embed in a real world scenario.
Created on December 29|Last edited on February 22
Note: this is an unedited example of a student's homework example to showcase how Reports can be valuable in a classroom setting

Abstract

The task chosen for this homework is the classification of domestic enviroments. The main goal behind this choice is to include the project in a real world website. The purpose of this website is to give people the opportunity to find new apartments in new buildings that meets specified requirements. So, when the rendering of an apartment are uploaded on the website, the inference process will attach the label of classification to the image; this information will be useful to then create automatic new filters based on new contents periodically added.
For this project the library chosen is pytorch, weight and biases has been use to generate this report and most of the data visualization graphs and images in this document. Furthermore have been choosen two differents type of model, one Deit, a VisionTransformer model based on the transformers architecture, and the other one, EfficientNet based on the classic convolutional neural networks architecture.

Data distribution

The dataset is self made and is almost a balanced dataset; to create the dataset has been used a google chrome extension that allows to download all the images given from a google image search.
The dataset is composed of 2911 images divided in 10 classes; In the interactive graph below is it possibile to see the exact number of images for each class: Balcony, Bathroom, Bedroom, Fireplace, Garden, Hammok, Kitchen, Panoramic view, Pool, Stairs.


Have been used dataloaders to manage the training validation and testing division of the entire dataset.
The ratio for the division is as follow: training 80%, validation ad test 10%. The batch size for training is 128 images and the batch size for validation and test is 32. Given this, there are 19 batches (2324 images) for training, 9 batches for validation and 10 batches for test.
All the test have been mede with the same random seed for splitting to ensure that the comparison is correct.

Data preprocessing

Models & Architectures

In this work have been used 2 different models: VisionTransformer and EfficientNet.

VisionTransformer Deit

Efficient Net

Runs

Both the model's architecture have been slightly adapted to the specific problem requirements such us output number of classes.

VisionTransformer Deit

Efficient Net

Results

Note that to calculate the metrics has been used the weighted-averaging: a method of averaging the scores for each class in a multi-class classification problem. It is similar to macro-averaging, which involves computing the scores for each class separately and then averaging them. However, in weighted averaging, different weights are assigned to each class based on the number of samples in that class. This has been done using sklearn.metrics package.
The metrics calculated are precision, recall and f1 measure; weighted-averaging has been chosen with respect to macro averaging to take into account the small changes of classes distribution in the dataset.

Vision Transformer Deit

Efficient Net

Head type 1

The type 1 head got the following measures of performance on the test set:
  • Test Loss: 0.0536
  • Train Accuracy: 88% (256 correct)
In details this are the accuracy performance for each class:
ClassesAccuracy
Balcony55%
Bathroom96%
Bedroom90%
Fireplace88%
Garden83%
Hammok95%
Kitchen93%
Panoramic_view88%
Pool91%
Stairs96%







Confusion Matrix EfficientNet-B0 with head type 1


Head type 2

The type 2 head test got the following measures of performance
  • Test Loss: 0.0511
  • Train Accuracy: 73% (212 correct)
In details this are the accuracy performance for each class
ClassesAccuracy
Balcony42%
Bathroom57%
Bedroom70%
Fireplace88%
Garden81%
Hammok80%
Kitchen63%
Panoramic_view78%
Pool77%
Stairs89%







Confusion Matrix EfficientNet-B0 with head type 2

Conclusion

Taking a look to the test set accuracy reached from the three tests, the best model is EfficientNet with head type 1 with 88%, then the VisionTransform model (Deit) got 84% and the last, EfficientNet with head type 2 got 73%.
An observation on the accuracy values of the single classes in test set: it looks like the most difficult class to correct classify is the 'Balcony' class; Something that would be useful to increase performace is to give the model more sample in that class (there are slightly less samples than the other classes) or to perform other augmentation technique.
Looking at the validation set accuracy over the training phase:


From this plot is possible to see that, even if there is a small advance in test set accuracy for the EfficientNet with head type 1 respect to the Deit model, the validation accuracy doesn't change too much and they looks very similar;
Other observations that can be made concern the other performance measures (precision,recall and f1) obtained from the models.



It is easy to observe that in every batch of the test set, in every metric, the Efficient Net Head type 1 reaches the highest values while the Efficient Net Head type 2 is down below the other two models.

In conclusion even if the head 2 for efficient net was supposed to create a better bottleneck to have the last 10 classes in the last layer, it has been the worst case in terms of performances. While, looking at the other two tests, it's worth noting that the Deit model has been freezed in the orginal architecture, so it didn't recieve an update of the weights with a consequent difficulty to learn and recognize the features; this could be a reason for which it performed less then the EfficientNet head type 1.