Plant Disease Classification

Created on July 6|Last edited on November 8
Comment
﻿
OverviewDataset Class DistributionPlant DistributionBackground RemovalDefining Baseline & best datasetCreating BaselineSelecting most relevant dataset Selecting top-k modelsComparing top-k models metricsResults:Interpretability with Grad-CAM
﻿
OverviewCrop diseases are a major threat to food production and food security. However their identification remains difficult. Given an image of a plant leaf, we want to identify if it is sick, and if so, identify the corresponding disease.
We will be using a dataset of more than 50 000 images of healthy and diseased plant leaves, collected under controlled conditions. Concerning the labels, the project's aim was to classify 26 diseases (+ "healthy" category) for 14 different plants. We will be training different deep learning models. 
Different datasets were generated : 
images with no augmentation
images with augmentation (techniques: image flipping, gamma correction, noise injection, PCA color enhancement, rotation and scaling)
images with augmentation & background removal (segmentation) 
images with augmentation in CIE LAB format instead of RGB color coordinates
We will be using the different augmented datasets.
Our best model (lab 2path InceptionResNetV2) achieves an accuracy of  99.86% and a F1 score of 99.86% on our hold-out test set.
﻿
Dataset Class Distribution
﻿
Out dataset is imbalanced : 
Orange_citrus_greening, Tomato_yellow_leaf_curl, Soybean_healthy are predominant (more than 5000 images). These class are over represented compared to the others, in average more than the double of the amount of the other classes. 
Peach_bacterial_spot and Tomato_bacterial spot are also heavily present in the dataset
﻿
Plant Distribution
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
﻿
Soybean, Raspberry and Blueberry do not have a disease label. 
Squash and Orange do not have a healthy label.
Among the plants, tomato, peach and apple are pretty imbalanced 
﻿
Background RemovalI generated a dataset with background segmentation to see if our model would perform better on segmented leaves. However, not having any masks for segmentation, I implemented a custom background segmentation :
Color cast removal using HSV  filters + morphological transformation
Opening : erosion followed by dilation 
Closing : dilation followed by Erosion
Optional : 
Contrast Limited Adaptive Histogram Equalization (CLAHE) method
Adjusting lightness with filters on CIE LAB  - (L)  channel 
Distance transformation 
The segmented image we used -> "back segm image"  (before last image)
﻿
﻿
﻿
﻿
﻿
﻿
﻿
the mask on greenish color takes blueish colors as some disease have this color (see below)  
﻿
﻿
The custom background removal doesn't work perfectly on images where the background color is similar to a disease color or a leaf color. 
The background removal has also some difficulties with shades and light reflection.
﻿
Defining Baseline & best datasetAs the classical ML models (SVM, Decision trees) don't have great performance on our dataset, I decided to focus only on the DeepLearning models. Moreover all the feature extraction (Graycomatrix, HOG, color histogram, texture ... ) for the classical ML model take a large amount of compute & time. 
Creating BaselineSimple CNN 8-12 Layers : 
testing different simple architecture with AlexNet as reference :
testing different normalization methods : 
sample wise scaling -> scale pixels between -1 and 1 
centering pixels from training set mean  
scale between 0 and 1 (zero center by mean) and stardardize pixels from train set stats
﻿
﻿
=> baseline : baseline_conv_sample_scale
﻿
Selecting most relevant dataset basic (highly imbalanced) : size 128 & 224
augmented (reduced class imbalance) : size 128 & 224
﻿
﻿
Best dataset => augmented with size 224
Augmentation:
improves model's performance 
enhances the ability of our model to recognize new variants of our training data
improves model's ability to generalize on unseen images. 
Selected dataset => augmented with size 128 :
128 -> for a reduce computational cost 
Selecting top-k modelsI re-implemented and tested the CNN model from the paper "Reliable Deep Learning Plant Leaf Disease Classification Based on Light-Chroma Separated Branches". ﻿﻿
In this paper, the authors propose a method which converts input images from RGB to CIE LAB coordinates. Then they feed the achromatic L and chromatic AB channel into 2 seperate branches along the first 3 layers of a modified Inception V3. Afterwards they concatenate these 2 channels and feed them to the following 5 inception block of the Inception V3. Finally they add a classification block (Global average pooling followed by a Softmax layer). This method tries to improve the model's reliability to classify when perturbing the original RGB images with several types of noise (salt and pepper, blurring, motion blurring and occlusions). These types of noise simulate common image variability found in the natural environment.
﻿
﻿
﻿
Following this paper I implemented a similar model based on the tiny InceptionResnet architecture.  
﻿
For both models we compare 2 branches variants: 
20% L - 80% AB
50% L - 50% AB﻿﻿
﻿
﻿
From these models we kept the one with the highest validation F1 score :
LAB_2path_InceptionResNetV2  with 20%L-80%AB.
We compare the LAB_2path_InceptionResNetV2 and the LAB_2path_InceptionV3 with our baseline and with 3 other models that we trained:
ResNet50V2 (from scratch)
InceptionV3 (from scratch)
ConvNexT
﻿
﻿
-> Best models : 
LAB_2path_InceptionResNetV2 
InceptionV3 
ConvNexT
﻿
I also decided to fine-tune some pretrained models as comparative study.
For these models I froze the base model and created a new classification head on top. 
﻿
﻿
I decided to finetune these 2 models (unfreezing the base model) with a much lower learning rate (1e-5) instead of (1e-3). Also comparing EfficientNetV2B3 on the normal dataset and the custom segmented dataset.
﻿
﻿
﻿
I also fine-tuned pretrained Vision Transformers model (from Hugging Face):
VIT : vit-base-patch16-224
ConvNexT : convnext-tiny-224
Swin : swin-tiny-patch4-window7-22
CvT : cvt-13
﻿
﻿
Vision Transformer is the most promising transformer on our dataset
However as the transformers model are quite heavy and compute hungry, I decided to focus only on CNN models.
﻿
Comparing top-k models metrics﻿
﻿
﻿
Results:﻿
﻿
Interpretability with Grad-CAM﻿
﻿
﻿
﻿
﻿
﻿
Add a comment