Skip to main content

Plant Disease Classification

Created on July 6|Last edited on November 8


Overview

Crop diseases are a major threat to food production and food security. However their identification remains difficult. Given an image of a plant leaf, we want to identify if it is sick, and if so, identify the corresponding disease.
We will be using a dataset of more than 50 000 images of healthy and diseased plant leaves, collected under controlled conditions. Concerning the labels, the project's aim was to classify 26 diseases (+ "healthy" category) for 14 different plants. We will be training different deep learning models.
Different datasets were generated :
  • images with no augmentation
  • images with augmentation (techniques: image flipping, gamma correction, noise injection, PCA color enhancement, rotation and scaling)
  • images with augmentation & background removal (segmentation)
  • images with augmentation in CIE LAB format instead of RGB color coordinates
We will be using the different augmented datasets.
Our best model (lab 2path InceptionResNetV2) achieves an accuracy of 99.86% and a F1 score of 99.86% on our hold-out test set.


Dataset Class Distribution


  • Out dataset is imbalanced :
    • Orange_citrus_greening, Tomato_yellow_leaf_curl, Soybean_healthy are predominant (more than 5000 images). These class are over represented compared to the others, in average more than the double of the amount of the other classes.
    • Peach_bacterial_spot and Tomato_bacterial spot are also heavily present in the dataset


Plant Distribution















  • Soybean, Raspberry and Blueberry do not have a disease label.
  • Squash and Orange do not have a healthy label.
  • Among the plants, tomato, peach and apple are pretty imbalanced


Background Removal

I generated a dataset with background segmentation to see if our model would perform better on segmented leaves. However, not having any masks for segmentation, I implemented a custom background segmentation :
  • Color cast removal using HSV filters + morphological transformation
    • Opening : erosion followed by dilation
    • Closing : dilation followed by Erosion
  • Optional :
    • Contrast Limited Adaptive Histogram Equalization (CLAHE) method
    • Adjusting lightness with filters on CIE LAB - (L) channel
    • Distance transformation
  • The segmented image we used -> "back segm image" (before last image)







the mask on greenish color takes blueish colors as some disease have this color (see below)


  • The custom background removal doesn't work perfectly on images where the background color is similar to a disease color or a leaf color.
  • The background removal has also some difficulties with shades and light reflection.


Defining Baseline & best dataset

As the classical ML models (SVM, Decision trees) don't have great performance on our dataset, I decided to focus only on the DeepLearning models. Moreover all the feature extraction (Graycomatrix, HOG, color histogram, texture ... ) for the classical ML model take a large amount of compute & time.

Creating Baseline

Simple CNN 8-12 Layers :
  • testing different simple architecture with AlexNet as reference :
  • testing different normalization methods :
    • sample wise scaling -> scale pixels between -1 and 1
    • centering pixels from training set mean
    • scale between 0 and 1 (zero center by mean) and stardardize pixels from train set stats


=> baseline : baseline_conv_sample_scale


Selecting most relevant dataset

  • basic (highly imbalanced) : size 128 & 224
  • augmented (reduced class imbalance) : size 128 & 224


Best dataset => augmented with size 224
Augmentation:
  • improves model's performance
  • enhances the ability of our model to recognize new variants of our training data
  • improves model's ability to generalize on unseen images.
Selected dataset => augmented with size 128 :
  • 128 -> for a reduce computational cost

Selecting top-k models

In this paper, the authors propose a method which converts input images from RGB to CIE LAB coordinates. Then they feed the achromatic L and chromatic AB channel into 2 seperate branches along the first 3 layers of a modified Inception V3. Afterwards they concatenate these 2 channels and feed them to the following 5 inception block of the Inception V3. Finally they add a classification block (Global average pooling followed by a Softmax layer). This method tries to improve the model's reliability to classify when perturbing the original RGB images with several types of noise (salt and pepper, blurring, motion blurring and occlusions). These types of noise simulate common image variability found in the natural environment.




Following this paper I implemented a similar model based on the tiny InceptionResnet architecture.


For both models we compare 2 branches variants:
  • 20% L - 80% AB
  • 50% L - 50% AB


From these models we kept the one with the highest validation F1 score :
  • LAB_2path_InceptionResNetV2 with 20%L-80%AB.
We compare the LAB_2path_InceptionResNetV2 and the LAB_2path_InceptionV3 with our baseline and with 3 other models that we trained:
  • ResNet50V2 (from scratch)
  • InceptionV3 (from scratch)
  • ConvNexT


-> Best models :
  • LAB_2path_InceptionResNetV2
  • InceptionV3
  • ConvNexT

I also decided to fine-tune some pretrained models as comparative study.
For these models I froze the base model and created a new classification head on top.


I decided to finetune these 2 models (unfreezing the base model) with a much lower learning rate (1e-5) instead of (1e-3). Also comparing EfficientNetV2B3 on the normal dataset and the custom segmented dataset.



I also fine-tuned pretrained Vision Transformers model (from Hugging Face):
  • VIT : vit-base-patch16-224
  • ConvNexT : convnext-tiny-224
  • Swin : swin-tiny-patch4-window7-22
  • CvT : cvt-13


  • Vision Transformer is the most promising transformer on our dataset
  • However as the transformers model are quite heavy and compute hungry, I decided to focus only on CNN models.


Comparing top-k models metrics





Results:




Interpretability with Grad-CAM