FOD Augmentations Experiments

These experiments aim to determine the effects of altering each FOD dataset augmentation parameter on model performance.
Ho Jin Chang
Created on December 22|Last edited on January 6
Comment
﻿
ResultsThe FOD augmentation parameters for each experiment can be found here.
The experiment results can be found in this spreadsheet.
﻿
1. IntroductionTo train a FOD (foreign object debris) detector, an augmented dataset must be created from images of the base material. The FOD-Augmentation GitHub repo performs data augmentations and generates a dataset with artificial FOD textures. The resulting dataset can be altered by several augmentation parameters. In these experiments, each of the parameters will be altered individually to create unique datasets for training. Isolating each parameter can help determine their effect on model performance. 
A total of 38 experiments are conducted.
2. Base ModelA model was first trained on a dataset with augmentations called, "baseline parameters". The baseline model will be further trained on a unique dataset for each experiment. The effect of the augmentation parameter is measured by the relative change from the baseline metrics (precision, recall, mAP_0.5, and mAP_0.5:0.95).
The baseline model was trained for 100 epochs.
Baseline augmentation parameters:
﻿
General parameters
n_samples = 400 # num of samples to take from imgs folder
n_augments = 5 # num of times to augment one sample
img_w = 480 # image width size to resize to
img_h = 480 # image height size to resize to
use_orig_img_size = False # if true disregard above image sizes and use original
n_fod = 5 # num of fod to add to the image
wh_thresh = 5 # width and height threshold for bboxes
# ranges for ellipse creation
area_ranges = [[0.05 * np.pi / 4 * (15 ** 2), 0.05 * np.pi / 4 * (200 ** 2)], # small fod range
              [0.05 * np.pi / 4 * (201 ** 2), 0.05 * np.pi / 4 * (500 ** 2)], # larger fod range
              ]
aspect_ranges = [[1, 7], [1, 7]]
ranges_weights = [0.5, 0.5] # ith weight is the probability that the ith area/aspect ranges will be chosen
ranges_indices = [0, 1] # indices of the different ranges
opacity_range = [0.35, 1.0]
Original material image colour transform
# Original Image Colour Jitter
orig_img = ColorJitter(brightness=0.1,        
                       contrast=0.1,
                       saturation=0.1,
                       hue=0.05)(orig_img)
FOD textures colour transform
# FOD Textures Colour Jitter
texture_img = ColorJitter(brightness=0.8,        # random_color
                          contrast=0.8,
                          saturation=0.8,
                          hue=0.5)(texture_img)
Final image brightness
# Final Image Brightness
image = ColorJitter(brightness=0.7)(image)
2.1 Model PerformanceTraining metrics:
﻿
Run: baseline_model1
﻿
Augmented validation set metrics (Baseline Model):
Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95:
  all        215        895      0.938      0.873      0.917      0.549
Toyota-twill test set metrics (Baseline Model):
 Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 
   all        192        607      0.895      0.815       0.88      0.594
3. ExperimentsEvery augmentation parameter listed under general parameters, original material image colour jitter, FOD textures colour jitter, and final image brightness (Section 2) were changed to isolate their effects on model performance. Each alteration created a new augmented dataset to further train the baseline model. The model's performance is evaluated based on the relative increase/decrease in the metrics. 
A total of 36 experiments were conducted (each parameter alteration = 1 new experiment).
The specific augmentation settings for each experiment can be found here.
The models were trained for 75 epochs.
﻿
3.1 Training Model PerformanceTraining runs:
﻿
Run set40
﻿
The training results are summarized in page 1 of this spreadsheet.
Observing the relative changes in the metrics, every experiment improved the performance of the model.
Precision
Experiments 9, 30, 20, 26, and 14 improved precision the greatest with 6.11%, 5.51%, 5.50%, 5.50%, and 5.40% relative increase respectively.
Experiments 24, 5, and 12 improved precision the least with 2.36%, 2.79%, and 2.88% relative increase respectively.
Recall
Experiment 5 resulted in a 0.5% decrease in recall from the baseline model. 
This result is expected as decreasing the size of the smaller FOD makes it more difficult for the model to detect.

Experiments 6, 11, 14, and 26 improved recall the greatest with 8.18%, 6.87%, 6.21%, and 6.06% relative increase respectively.
Experiments 13, and 35 improved recall the least with 1.03% and 1.54% relative increase respectively.
mAP_0.5
Experiments 6 and 14 improved mAP_0.5 the greatest with 8.56% and 8.14% relative increase respectively.
Experiments 5 and 13 improved mAP_0.5 the least with 0.93% and 2.80% relative increase respectively.
mAP_0.5:0.95
Experiments 11, 4, 17, and 16 improved mAP_0.5:0.95 the greatest with 39.17%, 35.22%, 34.79%, and 33.59% relative increase respectively.
Experiments 5 and 15 improved mAP_0.5:0.95 the least with 11.41% and 15.46 relative increase respectively.
﻿
3.2 Model Performance on Augmented Validation SetAll 36 models are evaluated on the same augmented validation set. The augmented validation set was taken from the base model's dataset (refer to Section 2 for parameters). Bayesian optimization was run on the base model and validation set to determine the optimal confidence and IoU thresholds. 
The thresholds used are:
conf_thresh = 0.3974
iou_thresh = 0.5393
The validation results are summarized in page 2 of this spreadsheet.
Overall, every augmentation improved the performance of the baseline model on the validation set.
Precision 
Experiments 33, 26, and 1 improved precision the greatest with 4.48%, 4.26%, and 4.16% relative increase respectively.
Experiments 27, 24, and 28 improved precision the least with 1.92%, 2.24%, and 2.24% relative increase respectively.
Recall
Experiments 11, 24, and 31 improved recall the greatest with 5.04%, 4.35%, and 4.35% relative increase respectively.
Experiments 12 and 35 improve recall the least with 1.72% and 1.72% relative increase respectively.
mAP_0.5
Experiments 5, 4, and 11 improved mAP_0.5 the greatest with 4.69%, 4.36%, and 4.14% relative increase respectively.
Experiments 34, 12, 27, 28, and 35 improved mAP_0.5 the least with 2.51%, 2.62%, 2.62%, 2.62%, 2.62% relative increase respectively.
mAP_0.5:0.95
Experiments 17 and 33 improved mAP_0.5:0.95 the greatest with 35.70% and 30.42% relative increase respectively.
Experiments 15, 12, 22, and 29 improved mAP_0.5:0.95 the least with 16.03%, 19.85%, 19.85%, and 19.85% relative increase respectively.
﻿
3.3 Model Performance on Toyota-Twill Test SetAll 36 models are evaluated on the Toyota-Twill test set. The same confidence and IoU thresholds in Section 3.2 are used.
The Toyota-Twill test results are summarized in page 3 of this spreadsheet.
Interestingly, many of the data augmentations decreased the performance of the baseline model on the Toyota-Twill test set.
Precision
Only Experiments 12, 13, 27, 9, and 30 improved precision with 3.02%, 2.68%, 2.01%, 1.68%, and 1.23% relative increase respectively.
Experiments 10, 36, 33, and 24 decreased precision the greatest with 19.33%, 14.08%, 13.86%, and 12.40% relative decrease respectively.
Every other experiment decreased precision.
Recall
Experiments 20, 12, 5, and 13 improved recall the greatest with 6.26%, 4.66%, 3.93%, and 3.31% relative increase respectively.
Experiments 2, 6, 27, 28, and 31 also improved recall.
Experiments 29, 4, 24, 31, 34 decreased recall the greatest with 22.58%, 17.91%, 17.06%, 16.08%, and 15.46% relative decrease respectively.
Every other experiment decreased recall.
mAP_0.5
Experiments 12, 20, and 13 improved mAP_0.5 the greatest with 3.75%, 3.18%, and 2.84% relative increase respectively.
Experiments 5, 27, 28, and 30 also improved mAP_0.5.
Experiments 29, 4, 10, and 24 decreased mAP_0.5 the greatest with 13.75%, 13.41%, 13.07%, and 12.61% relative decrease respectively.
Every other experiment decreased mAP_0.5.
mAP_0.5:0.95
Experiments 9, 13, 3, and 6 improved mAP_0.5:0.95 the greatest with 5.56%, 5.56%, 4.38%, and 3.70% relative increase respectively.
Experiments 12, 15, and 32 also improved mAP_0.5:0.95.
Experiments 18, 10, 24, and 29 decreased mAP_0.5:0.95 the greatest with 20.88%, 18.18%, 17.85%, and 17.17% relative decrease respectively.
Every other experiment decreased mAP_0.5:0.95
﻿
3.4 Combining Best AugmentationsExperiments that improved F1-score on the Toyota-Twill Test Set were chosen to test their additive effects.  
Experiments that improved F1-score on the Toyota-Twill Test Set:
Experiment 5 (decreasing small FOD upper range / large FOD lower range by 100)
Experiment 12 (increasing aspect range of large FOD)
Experiment 13 (decreasing lower limit of opacity)
Experiment 20 (decreasing contrast of original image)
Experiment 27 (increasing brightness of FOD texture)
Experiment 28 (decreasing brightness of FOD texture)
Experiment 30 (decreasing contrast of FOD texture)
Since Experiments 27 and 28 involve increasing and decreasing the same parameter, they are split into two separate experiments.
On the Toyota-Twill Test Set:
Experiment 37 improved F1-score by 3.50%.
Experiment 38 improved F1-score by 6.53%.
The additive effects of the augmentations in Experiment 38 improved the baseline model's performance further than the individual augments. 
Although Experiment 37 improved baseline performance, Experiment 12 alone resulted in better performance by 0.375%. 
﻿
4. ConclusionsTrainingEvery experiment resulted in performance gain.
FOD Area Range
Experiment 5 (decreasing small FOD upper range / large FOD lower range by 100) resulted in the lowest performance gain for every metric.
Saw 0.51% relative decrease in recall.
Decreasing the size makes it more difficult for the model to detect the smaller FOD.

Experiment 6 (increasing small FOD upper range / large FOD lower range by 100) resulted in the highest performance gain in recall (8.18%) and mAP_0.5 (8.56%).
Increasing the size of the small FOD makes it easier for the model to detect.

FOD Aspect Range
Experiment 9 (decreasing aspect range of small FOD) resulted in the highest performance gain in precision (6.11%).
Experiment 11 (decreasing aspect range of large FOD) resulted in good performance gain in recall (6.87%).
Possibly because as you stretch the larger FOD, the bounding box becomes larger (largest when FOD is oriented at 45 degrees). The bounding box for this FOD can overlap with the small FODs and cause more false negatives).

Experiment 12 (increasing aspect range of large FOD) resulted in low performance gain in precision (2.88%).
Opacity Range
Experiment 13 (decreasing lower limit of opacity) resulted in low performance gain in recall (1.03%) and mAP_0.5 (2.80%).
Model was unable to detect really transparent FOD.

Experiment 14 (increasing lower limit of opacity)) resulted in good performance gain in precision (5.40%), recall (6.21%), and map_0.5 (8.14%).
Colour Jitter
Overall, for colour jitter parameters for both original image and FOD textures, decreasing the contrast resulted in performance gain in precision.
Decreasing the hue in the original image resulted in the lowest performance gain in precision (2.36%).
Final Image Brightness
Experiment 35 (decreasing the brightness by 0.2) resulted in low performance gain in recall (1.54%).
Augmented Validation SetEvery experiment resulted in performance gain.
FOD Aspect Range
Similar to training results, Experiment 11 resulted in highest performance gain in recall (5.04%).
Experiment 12 resulted in lowest performance gain in recall (1.72%).
Colour Jitter
Similar to training results, decreasing the hue for the original image resulted in low performance gain in precision (2.24%).
However, resulted in good performance gain in recall (4.35%).

Experiments 27 and 28 (increasing and decreasing FOD texture brightness) resulted in the lowest performance gain in precision (1.92% and 2.24%).
Experiment 31 (increasing saturation of FOD texture) resulted in good performance gain in recall (4.35%).
Experiment 33 (increasing hue of FOD texture) resulted in highest performance gain in precision (4.48%).
Experiment 34 (decreasing hue of FOD texture) resulted in low performance gain in recall (2.18%).
Final Image Brightness
Experiment 35 (decreasing the brightness by 0.2) resulted in lowest performance gain in recall (1.72%).
Toyota-Twill Test SetMost experiments resulted in a performance drop.
Experiments which improved performance
Interestingly, Experiment 12 (increasing aspect range of large FOD) performed the best.
Improved precision (3.02%), recall (4.66%), mAP_0.5 (3.75%), and mAP_0.5:0.95 (0.84%).

Experiment 13 (decreasing lower limit of opacity) performed good as well.
Improved precision (2.68%), recall (3.31%), mAP_0.5 (2.84%), and mAP_0.5:0.95 (5.56%).

Experiments 27 and 28 (increasing and decreasing brightness of FOD texture) improved precision, recall, and mAP_0.5.
Experiment 30 (decreasing contrast of FOD texture) improved precision, recall, and mAP_0.5
Experiment 2 (increasing number of augmentations) improved recall (1.47%), but reduced the other metrics.
Experiment 5 (decreasing small FOD upper range / large FOD lower range by 100) improved recall (3.93%) and mAP_0.5 (1.56%), but reduced precision.
Experiment 9 (decreasing aspect range of small FOD) improved precision (1.68%), but reduced the other metrics.
Experiment 20 (decreasing contrast of original image) improved recall (6.26%) and mAP_0.5 (3.18%), but slightly reduced precision (0.22%).
Experiment 37 improved F1-score (3.50%).
Experiment 38 improved F1-score (6.53%).
The additive effects of Experiments 5, 12, 13, 20, 28, and 30 improved F1-score greater than the individual augments alone.
Experiments which reduced performance
Every experiment (except the ones listed above reduced performance).

﻿
﻿
Add a comment