Tackling Water Pollution using YOLO-NAS and W&B

See how you can get started with using the WandB ecosystem to solve end-to-end problems.
Created on November 19|Last edited on January 9
Comment
﻿
IntroductionThis report is meant to serve as an introduction to an end-to-end object detection workflow on a semi-barebones level. We'll be building a model to help combat the real-world problem of water pollution and including a pair of colabs so you can re-create this experiment on your own end. 
At a high level, here's what we'll be covering:
Table of ContentsIntroductionThe ProblemThe DatasetThe ModelExperimentations and Actionable Insights from our BaselinesTranslating Insights into PerformanceInference on Test Data - Performance with Test DistributionsConclusion
﻿
The ProblemWater pollution is an extremely challenging problem. We currently have an estimated 5.25 trillion pieces of plastic floating in the seas and oceans of Earth and that number rises year over year. Animals and fish ingest these plastics and microfibers, which end up causing severe health implications on a local level as well as a global one. And since many of these animals are lower on the food chain, plastics quickly transmits themselves up the food chain, affecting both apex predators and, yes, human beings. 
Additionally, the presence of plastic degrades water quality across the board, leading to reduced access to drinking water in marginalized communities. Toxins released over time through plastics and trash can adversely impact the quality of water, affecting the marine biodiversity of a region as well.
There's no single thing that fixes this problem, of course, but detecting these toxins is a vital step to cleaning them up. Today, we'll going to build a machine learning model to detect trash in images. This would allow other automated techniques to be used in tandem to process it accordingly.
The DatasetWe're going to use the Trash-Sea-10 Dataset from RoboFlow to perform our experiments. It has a total of 15,211 images across 5 different classes (Buoy, Can, Paper, Plastic Bag, Plastic Bottle). The dataset is already split into Train, Validation, and Test sets in an 83%-12%-5% split.
We are going to have three distinct steps in our dataset workflow. Let's take a look at each in detail: 
Dataset Loading and Version ControlFirst, we're going to download our dataset from the source. In our case here, we are going to use RoboFlow Universe which makes it simple. (In other cases where you might be using proprietary data, this step would generally just entail downloading this dataset onto your running machine.)
This would generally be a one-time hassle since we are now going to upload our dataset as a WandB Artifact to simplify our workflow and fetch our dataset into the environment from there:
# Download Dataset for the first time
from roboflow import Roboflow
rf = Roboflow(api_key=os.getenv("YOUR_ROBOFLOW_API_KEY"))
project = rf.workspace("easyhyeon").project("trash-sea")
dataset = project.version(10).download("yolov5")
﻿
# Upload as a WandB Artifact
with wandb.init(project=PROJECT_NAME, entity=ENTITY, resume="allow", save_code=True)
    artifact = wandb.Artifact('trash-sea-10', type='dataset')
    artifact.add_dir(DATASET_PATH)
    wandb.log_artifact(artifact)
Now that our dataset is uploaded, we don't need to download it from the source every time anymore. We can quickly get it from WandB with just 3 lines of code!
# Get Artifact
with wandb.init(﻿project=PROJECT_NAME, entity=ENTITY)﻿:
    artifact = run.use_artifact(
	﻿'ml-colabs/fconn-yolo-nas/trash-sea-10:latest'﻿,
	type﻿=﻿'dataset'﻿
    )
    artifact_dir = artifact.download(﻿)
Dataset RegistrationNow that our dataset is downloaded onto our machine, we need to ingest it and bring it into a representation that can be easily manipulated within our Python runtime. This step is often known as dataset registration.
﻿SuperGradients allows us to perform dataset registration for a wide variety of datasets in diverse formats, with support for YOLO-style annotations, COCO-style annotations, and many more. We are going to use the YOLO-style annotations of the dataset for our workflow. The code: 
# Define our variables for the dataset
dataset_params = {
    'data_dir':DATASET_PATH,
    'train_images_dir':'train/images', # The relative path for training images
    'train_labels_dir':'train/labels', # The relative path for training annotations
    'val_images_dir':'valid/images', # The relative path for validation images
    'val_labels_dir':'valid/labels', # The relative path for validation annotations
    'test_images_dir':'test/images', # The relative path for test images
    'test_labels_dir':'test/labels', # The relative path for test annotations
    'classes': ["Buoy", "Can", "Paper", "Plastic Bag", "Plastic Bottle"] # Classes
}
﻿
train_data = coco_detection_yolo_format_train(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['train_images_dir'],
        'labels_dir': dataset_params['train_labels_dir'],
        'classes': dataset_params['classes'],
    },
    dataloader_params={
        'batch_size':config["batch_size"], # Set batch_size according to experiments
        'num_workers':2 
        # This variable defines the number of workers that your CPU will initialize
	# and use for loading data. If your runtime has a high-CPU configuration,
        # feel free to use the num_workers argument as 2 or more. Else, reduce it
	# to 0 to have the main-process perform dataset fetching into the RAM.
    }
)
﻿
val_data = coco_detection_yolo_format_val(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['val_images_dir'],
        'labels_dir': dataset_params['val_labels_dir'],
        'classes': dataset_params['classes'],
    },
    dataloader_params={
        'batch_size':config["batch_size"],
        'num_workers':2
    }
)
﻿
test_data = coco_detection_yolo_format_val(
    dataset_params={
        'data_dir': dataset_params['data_dir'],
        'images_dir': dataset_params['test_images_dir'],
        'labels_dir': dataset_params['test_labels_dir'],
        'classes': dataset_params['classes'],
    },
    dataloader_params={
        'batch_size':config["batch_size"],
        'num_workers':2
    }
)
We now have our dataset up and ready for use. SuperGradients gives us DataLoaders, that we can utilize for running Training jobs as well as perform smaller tasks in a performant manner. Now, it's time to explore what this dataset is made of.
Exploratory Data AnalysisWe can divide our analysis of the dataset into specific sections. First, let's deal with the high-level details of exploring our class-wise annotation distribution. Then, we'll talk about the quality of our annotations on a low level.
You can follow along with the data analysis section of this report in this Google Colab: 
﻿
﻿
﻿
﻿
Defining our learning objectivesLooking at the data, we can deduce the following information:
We have 5 specific classes (Buoy, Can, Paper, Plastic Bag, and Plastic Bottle) wherein the number of objects varies significantly.
The dataset shows a deficiency of the Can and Plastic Bag classes but has an abundance of the Buoy and Plastic Bottle objects. This indicates a class imbalance that should be corrected.
The bottom-right area of the frame seems to have an abundance of objects to be detected and is populated by 3 different classes. This may be a concern wherein translation-invariance will play an important role.
Plastic Bottles and Paper have been captured from various angles and at various zoom levels, thus contributing to a large variation within the bounding box sizes for this class.
Buoys are normally captured from afar, contributing to a low variation within the size of the bounding boxes for this class.
All classes have a significant number of outliers when it comes to bounding box size.
Hence, keeping all of these points in mind, we can now loosely define our network's learning objective. We can say that, given an RGB image of different forms of plastic waste in the ocean, we must detect the type of waste objects present and place a bounding box over them. The challenges to overcome are the class imbalance, high variance of bounding box sizes, object locality skew (towards the bottom-right of the image), and model performance as a general problem to solve.
The ModelChoosing a model is often a difficult decision that needs experimentation and requires the user to make important design choices. Within the domain of object detection, the YOLO class of models has been considered state-of-the-art for a long time. Let's dive deeper into them.
YOLO Class of ModelsThe YOLO (You Only Look Once) Model is one of the most renowned vision architectures around for Object Detection. Developed and maintained originally by Ultralytics, this model has been trained on the MS-COCO (Common Objects in Context) dataset which makes it good on a generally high number of objects. Other model checkpoints also exist, trained on datasets like Open Image v7 and PASCAL-VOC.
YOLO-NAS Class of ModelsThe YOLO-NAS (Neural Architecture Search) class of models was developed and perfected by Deci.ai. The model was built upon the original YOLO architecture, and honed with the use of Deci.ai's proprietary Neural Architecture Search technology, AutoNAC.
YOLO-NAS is a foundational object detection model that uses several optimizations in place to improve model accuracy and performance. It is pre-trained on several object detection datasets like COCO, Objects365, and Roboflow-100, which makes it a strong contender to be used for most general tasks in an out-of-the-box fashion.
﻿
Some of the most important optimizations include:
Use of advanced regimes during the pre-training stage, such as knowledge distillation, post-training quantization (through the introduction of RepVGG blocks into the model architecture), quantization-aware training, and the introduction of a Distribution Focal Loss term.
Quantization-compatible approach wherein the Neural Architecture Search algorithm adaptively quantized layers, skipping those which may have a significant impact on the network's accuracy/latency trade-off. As a result, the network supports high compression up to INT8 quantization while suffering a minuscule drop in precision.
This class of models has three networks available with its original weights, and three models with the INT8-quantized weights. We are going to run our experiments on the original weights only. We keep all three in context to measure their differences and make a choice based on the performance of each model across different fronts. The models are as follows:
YOLO-NAS-S: With 19M params, this is the smallest YOLO-NAS variant. Deci.ai reports an impressive performance of 47.5 mAP on the COCO 2017 Validation dataset. This is supposed to be the most nimble and performant model in its class, owing to its small size.
YOLO-NAS-M: This model has up to 51.1M parameters, and is the medium-sized version to provide a balance between size and performance. It reports a performance of up to 51.55 mAP on the Validation set.
YOLO-NAS-L: This model is the largest of its class, at 66.9M params. It is the most performant model, giving up to 52.22 mAP on the Validation set.
To learn more about this class of networks and their architecture, the original Deci.Ai blog gets into things in more detail. 
Now, let's see how we use this model to solve our problem!
Experimentations and Actionable Insights from our Baselines
Baseline ExperimentsTo understand how we can make advances with clever optimization techniques as provided by WandB Sweeps, we must first establish some strong baselines to compare with. We run each variant of the YOLO-NAS model (S, M, and L) on an NVIDIA T4 provisioned on a Google Colab Free-Tier Jupyter notebook and perform our training.
﻿
 
Run set0
﻿
﻿
﻿
Baseline Runs3
﻿
Translating Insights into PerformanceNow that we've learned so much from our baselines, it's time to make use of the Sweeps feature and begin finding the best parameters for our job. You can follow along with this section in another Google Colab: 
﻿
Sweeps require you to initialize an agent that would execute each run and perform an optimization over the search space to find the right set of hyperparameters. Our Sweep will execute 200 runs in an NVIDIA A6000 GPU environment.
We're gonna look at how we can optimize our model to maximize the Validation mAP@0.50 metric. This may serve as a strong metric for the Bayesian Optimization search to find a good set of hyperparameters that can yield the best performance for our models.
﻿
Run set3
﻿
Final Model TrainingNow that we can see the kind of advanced performance our models can bring with the right hyperparameters, we'll use this information to train for a longer time and get the best model weights.
We train two backbones, the medium and large variants with their respective best set of hyperparameters, for 30 epochs on a NVIDIA A6000 GPU.
It is visible that in these experiments, the large model outclasses the medium model as expected. The loss graphs indicate a clear winner in comparison for both networks, and the validation set evaluation also paints a similar story:
﻿
Run set2
﻿
Inference on Test Data - Performance with Test DistributionsWe now perform a full test based on an unseen test dataset. We calculate the F1 and mAP scores along with a loss between the two models, and find that the large variant does better on all the metrics other than mAP! The loss for the large variant is significantly lower on each loss term, while the F1 score is subsequently higher.
On the other hand, we do see that the mAP score for the medium variant slightly edges that of the large variant.
In such a case, we can say that the medium variant is better at identifying the area of interest of an object. However, the large variant is better at identifying the class of the object.
﻿
Run set2
﻿
ConclusionThat's it! We have now taken a look at how we can leverage the existing integration of SuperGradients and Weights & Biases to create a strong object detection model that can solve the problem of identifying plastics in the open sea. To recap, we saw the following
Load our data present in the YOLO dataset format using SuperGradients.
Instantiate and use the latest YOLO-NAS model from Deci.ai.
Perform dataset analysis using Weave by Weights & Biases.
Create baselines for each model variant available.
Initiate a full WandB Sweep for 200 runs and get the best set of hyperparameters.
Train a final model for a longer time and more number of epochs.
Test the model on unseen data and gather results.
﻿
Supercharging Ultralytics with Weights & Biases
A guide on using Weights & Biases with Ultralytics workflows for computer vision models
Object Detection using YOLOv8: An End-to-End Workflow
A comprehensive guide to building an object detection workflow using Ultralytics YOLOv8  and Weights & Biases.
Training Semantic Segmentation Models for Autonomous Vehicles (A Step-by-Step Guide)
A short tutorial on leveraging Weights & Biases to train a semantic segmentation model for autonomous vehicles. 
XLA Compatibility of Vision Models in Keras
A set of comprehensive benchmarks around XLA compatibility of computer vision models implemented in Keras.
﻿
﻿
Add a comment
Tags: Articles, Experiment, Intermediate, Computer Vision, YOLO, Panels, Sweeps, Object Detection
Iterate on AI agents and models faster. Try Weights & Biases today.