Plant Disease Classification
Created on July 6|Last edited on November 8
Comment
OverviewDataset Class DistributionPlant DistributionBackground RemovalDefining Baseline & best datasetCreating BaselineSelecting most relevant dataset Selecting top-k modelsComparing top-k models metricsResults:Interpretability with Grad-CAM
Overview
Crop diseases are a major threat to food production and food security. However their identification remains difficult. Given an image of a plant leaf, we want to identify if it is sick, and if so, identify the corresponding disease.
We will be using a dataset of more than 50 000 images of healthy and diseased plant leaves, collected under controlled conditions. Concerning the labels, the project's aim was to classify 26 diseases (+ "healthy" category) for 14 different plants. We will be training different deep learning models.
Different datasets were generated :
- images with no augmentation
- images with augmentation (techniques: image flipping, gamma correction, noise injection, PCA color enhancement, rotation and scaling)
- images with augmentation & background removal (segmentation)
- images with augmentation in CIE LAB format instead of RGB color coordinates
We will be using the different augmented datasets.
Our best model (lab 2path InceptionResNetV2) achieves an accuracy of 99.86% and a F1 score of 99.86% on our hold-out test set.
Dataset Class Distribution

- Out dataset is imbalanced :
- Orange_citrus_greening, Tomato_yellow_leaf_curl, Soybean_healthy are predominant (more than 5000 images). These class are over represented compared to the others, in average more than the double of the amount of the other classes.
- Peach_bacterial_spot and Tomato_bacterial spot are also heavily present in the dataset
Plant Distribution














- Soybean, Raspberry and Blueberry do not have a disease label.
- Squash and Orange do not have a healthy label.
- Among the plants, tomato, peach and apple are pretty imbalanced
Background Removal
I generated a dataset with background segmentation to see if our model would perform better on segmented leaves. However, not having any masks for segmentation, I implemented a custom background segmentation :
- Color cast removal using HSV filters + morphological transformation
- Opening : erosion followed by dilation
- Closing : dilation followed by Erosion
- Optional :
- Contrast Limited Adaptive Histogram Equalization (CLAHE) method
- Adjusting lightness with filters on CIE LAB - (L) channel
- Distance transformation
- The segmented image we used -> "back segm image" (before last image)








the mask on greenish color takes blueish colors as some disease have this color (see below)
- The custom background removal doesn't work perfectly on images where the background color is similar to a disease color or a leaf color.
- The background removal has also some difficulties with shades and light reflection.
Defining Baseline & best dataset
As the classical ML models (SVM, Decision trees) don't have great performance on our dataset, I decided to focus only on the DeepLearning models. Moreover all the feature extraction (Graycomatrix, HOG, color histogram, texture ... ) for the classical ML model take a large amount of compute & time.
Creating Baseline
Simple CNN 8-12 Layers :
- testing different simple architecture with AlexNet as reference :
- testing different normalization methods :
- sample wise scaling -> scale pixels between -1 and 1
- centering pixels from training set mean
- scale between 0 and 1 (zero center by mean) and stardardize pixels from train set stats
=> baseline : baseline_conv_sample_scale
Selecting most relevant dataset
- basic (highly imbalanced) : size 128 & 224
- augmented (reduced class imbalance) : size 128 & 224
Best dataset => augmented with size 224
Augmentation:
- improves model's performance
- enhances the ability of our model to recognize new variants of our training data
- improves model's ability to generalize on unseen images.
Selected dataset => augmented with size 128 :
- 128 -> for a reduce computational cost
Selecting top-k models
I re-implemented and tested the CNN model from the paper "Reliable Deep Learning Plant Leaf Disease Classification Based on Light-Chroma Separated Branches".
In this paper, the authors propose a method which converts input images from RGB to CIE LAB coordinates. Then they feed the achromatic L and chromatic AB channel into 2 seperate branches along the first 3 layers of a modified Inception V3. Afterwards they concatenate these 2 channels and feed them to the following 5 inception block of the Inception V3. Finally they add a classification block (Global average pooling followed by a Softmax layer). This method tries to improve the model's reliability to classify when perturbing the original RGB images with several types of noise (salt and pepper, blurring, motion blurring and occlusions). These types of noise simulate common image variability found in the natural environment.

Following this paper I implemented a similar model based on the tiny InceptionResnet architecture.
For both models we compare 2 branches variants:
- 20% L - 80% AB
- 50% L - 50% AB
From these models we kept the one with the highest validation F1 score :
- LAB_2path_InceptionResNetV2 with 20%L-80%AB.
We compare the LAB_2path_InceptionResNetV2 and the LAB_2path_InceptionV3 with our baseline and with 3 other models that we trained:
- ResNet50V2 (from scratch)
- InceptionV3 (from scratch)
- ConvNexT
-> Best models :
- LAB_2path_InceptionResNetV2
- InceptionV3
- ConvNexT
I also decided to fine-tune some pretrained models as comparative study.
For these models I froze the base model and created a new classification head on top.
I decided to finetune these 2 models (unfreezing the base model) with a much lower learning rate (1e-5) instead of (1e-3). Also comparing EfficientNetV2B3 on the normal dataset and the custom segmented dataset.
I also fine-tuned pretrained Vision Transformers model (from Hugging Face):
- VIT : vit-base-patch16-224
- ConvNexT : convnext-tiny-224
- Swin : swin-tiny-patch4-window7-22
- CvT : cvt-13
- Vision Transformer is the most promising transformer on our dataset
- However as the transformers model are quite heavy and compute hungry, I decided to focus only on CNN models.
Comparing top-k models metrics
Results:
Interpretability with Grad-CAM
Add a comment