[Uma copy] Object Detection using YOLOv8: An End-to-End Workflow

A comprehensive guide to building an object detection workflow using Ultralytics YOLOv8 and Weights & Biases.
Created on December 3|Last edited on December 3
Comment
In this report, we’ll take you through an object detection workflow for autonomous vehicles with Weights & Biases. More specifically, you'll learn how to create a baseline object detection model using the YOLOv8 models from Ultralytics, improve it with continued experimentation (including selecting our highest performing backbone architecture and tuning our hyperparameters), analyze it with some common metrics, and identity which candidate model is our best performer, all in W&B. 
What We'll Be CoveringExploring our Datasets using TablesEstablishing The Learning Objective Of Our YOLOv8 ModelBaseline ExperimentsComparing YOLOv8 Flavors in W&BTuning Hyperparameters using W&B SweepsNarrowing Down on the Best Backbone using SweepsTraining the Final Object Detection Model for ProductionMetrics are half the TruthYOLOv8m Inference on Test DataConclusion
﻿
Exploring our Datasets using TablesThe Berkeley Deep Drive 100K Dataset (BDD100K) is a collection of video data for heterogeneous multitask learning. Unsurprisingly, it contains 100,000 videos from more than 50,000 individual rides. That variety is key as it allows us to have diverse scenes we want our model to understand–city streets, rural backroads, and highways, as well as a variety of weather and light conditions. BDD100K can be used for a sizeable portion of typical AV modeling (think lane detection, instance segmentation, etc.).
But today, we’ll be using it for model detection. 
﻿
﻿
First, let’s get our data. To do this, we’ll use W&B Artifacts, which makes it easy and convenient to store and version our dataset. Creating a new version of the dataset and fetching a particular version takes only a couple of lines of code: 
# Create Artifact
with wandb.init(﻿)﻿:
    artifact = wandb.Artifact(﻿'bdd100k-yolov5'﻿, type﻿=﻿'dataset'﻿)
    artifact.add_dir(dataset_path)
    wandb.log_artifact(artifact)`
﻿
# Fetching Artifact
with wandb.init(﻿)﻿:
    artifact = run.use_artifact(﻿'av-team/bdd100k-perception/bdd100k-yolov5:latest'﻿, type﻿=﻿'dataset'﻿)
    artifact_dir = artifact.download(﻿)
﻿
We're hosting a subset of the BDD100K dataset with object-detection annotations converted to a format compatible with training using the YOLOv5 framework by Ultralytics. Here's how that dataset looks as an Artifact:
﻿
project("reviewco", "object-detection-bdd").artifact("bdd100k-ultralytics-format")
bdd100k-ultralytics-formatVersion 1
All Versions
Aliases
latest
Versions
v1
v0
VersionMetadataUsageFilesLineage
Direct lineage view
Expanded graph
Include generated artifacts
Some nodes are concealed in this view - Break out items to reveal more.
Artifact - dataset
bdd100k-ultralytics-format:v1
Run - cli_put
good-violet-2
Runs
90
cosmic-grass-3
baseline
dashing-hill-2
baseline
upbeat-wave-1
baseline
breezy-dawn-4
baseline
jumping-bird-3
baseline
leafy-puddle-2
baseline
grateful-shape-1
baseline
eager-thunder-38
jolly-rain-37
vocal-music-35
revived-plant-34
summer-sun-32
rare-surf-31
spring-elevator-30
playful-eon-28
daily-durian-26
dutiful-frog-25
stellar-morning-23
classic-terrain-22
crimson-waterfall-20
super-darkness-19
efficient-frog-17
toasty-universe-15
colorful-violet-13
stoic-leaf-11
jolly-thunder-10
peachy-blaze-9
vital-breeze-7
laced-armadillo-5
giddy-water-3
silvery-bee-1
pious-leaf-1
baseline
model L
baseline
resilient-monkey-9
baseline
fast-hill-8
baseline
denim-thunder-7
baseline
rosy-snowflake-6
baseline
electric-plasma-5
baseline
WAGA model m
baseline
hearty-water-3
baseline
WAGA model n
baseline
easy-oath-7
baseline
giddy-sky-6
baseline
pious-jazz-5
baseline
twilight-puddle-4
baseline
pretty-armadillo-3
baseline
bumbling-night-2
baseline
glowing-pyramid-1
baseline
dry-plant-336
vivid-wood-335
swept-eon-334
stellar-microwave-333
solar-hill-332
serene-dawn-331
wise-yogurt-330
copper-elevator-329
expressive-date-320
kind-quiver-319
blissful-flower-318
delirious-dove-317
resolute-quiver-1
baseline
riveting-dove-316
lilac-pyramid-1
baseline
robust-lion-13
baseline
morning-gorge-12
baseline
crimson-shape-11
baseline
sparkling-violet-10
baseline
blooming-planet-9
baseline
pleasant-shadow-8
baseline
sleek-planet-7
baseline
dry-universe-6
baseline
charmed-water-5
baseline
visionary-fog-4
baseline
gallant-grass-3
baseline
worldly-monkey-2
baseline
neat-paper-1
baseline
devoted-monkey-1
baseline
yolov8n
test
rose-plasma-6
trial
divine-leaf-5
trial
woven-plasma-1
baseline
final/yolov8m
final/train
final/yolov5mu
final/train
final/yolov8m
final/train
final/yolov5mu
final/train
yolov8x
test
yolov8l
test
yolov8m
test
yolov8s
test
yolov8n
test
React Flow
We can also use Tables and Weave in our W&B workspace to visualize and explore our images and segmentation labels. Specifically, we'll dig into our subset of data to find out exactly what's in there. We'll quickly analyze the frequency distribution of the annotation labels using Custom Plots by Weights & Biases.  This can be valuable for all sorts of things–finding over- or under-represented classes, for example. In fact, below, you'll see our model won't do well with trains as we have no examples in this particular selection. 
These Tables are fully interactive, so feel free to explore. For example, you can click on any of the images in the column labeled Image-BBox column below and toggle both the bounding box and semantic segmentation corresponding to each image (we have a quick example gif at the end of this section as well). 
Below are graphs of class frequencies, you'll see a few Tables. The first contains our dataset, with object detection and segmentation labels, as well as annotations for weather, time of day, and scene. Below that, you'll see BDD100K grouped by weather conditions, scene, and time of day. 
﻿
Analysis and Exploration the BDD100K-Dataset using Weave and Tables3
﻿
Exploring Annotations Interactively
﻿
Establishing The Learning Objective Of Our YOLOv8 ModelNow that we’ve examined our dataset and its distribution, let’s look into what our YOLOv8 object detection model is supposed to learn. Here, we're interested in bounding box annotations corresponding to all objects of interest (such as vehicles, pedestrians, obstacles, etc.) present in a given frame of a video or camera feed. Our model not only needs to predict all the bounding box coordinates but also label them as one of the following classes:
bike 🚲
bus 🚎
car 🚗
motorcycle 🏍
person 🧍
rider 🚴‍♀️
traffic light 🚦
traffic sign 🛑
train 🚝
truck 🚚
Baseline ExperimentsFor establishing our baseline experiments, we decided to use the YOLOv8 ﻿family of foundational models for computer vision developed by Ultralytics. We performed the baseline experiments using all the variants of the YOLOv8 family of models using the default set of hyperparameters for five epochs each.
We will be using the Ultralytics integration for Weights & Biases to not only track our experiments but also debug the implementation correction of our pipeline and the annotation format of our dataset and check the improvement (or lack thereof) in the performance of our models on the validation set while training the model. If you want to learn more about the advanced features this integration offers, feel free to check out the following report:
Supercharging Ultralytics with Weights & Biases
A guide on using Weights & Biases with Ultralytics workflows for computer vision models
﻿
﻿
﻿
A baseline object detection workflow with Ultralytics and WandB0
﻿
Comparing YOLOv8 Flavors in W&BLet's compare which of our YOLOv8 models performs best. To do so, we'll look at a variety of common metrics, losses, and model parameter stats, such as the following: 
Precision attempts to answer the question, "What proportion of positive identifications was correct?" For a model that produces no false positives, the precision would be 1.0.
Recall attempts to answer the following question, "What proportion of actual positives was identified correctly?" For a model that produces no false negatives, the precision would be 1.0.
Average Precision (or AP) is a way to summarize the precision-recall curve into a single value representing the average of all precisions. Mean Average Precision or mAP is basically the mean of the average precision of all the classes. It is a widely used metric for evaluating the accuracy of object detection models. It measures how well a model can accurately locate and classify objects in an image. It calculates the average precision (AP) for each class or category of objects and then computes the mean of these AP scores.
Box Loss is a component of the loss function that measures the error in predicting the coordinates of the bounding boxes that surround the detected objects. Its purpose is to penalize the model when it makes errors in predicting the precise location and size of the bounding boxes.
Classification Loss is another component of the loss function in object detection. It measures the error in predicting the class labels or categories of objects present in the bounding boxes.
Detection Focal Loss is a variant of the classification loss designed to address class imbalance issues in object detection tasks. This helps the model focus on improving the accuracy of challenging object detections.
﻿
﻿
Baseline experiments with the YOLOv8 models from Ultralytics using WandB5
﻿
﻿
Based on the results from the baseline experiments, we can see that the larger the model is, the better it performs concerning the metrics for object detection. However, the higher number of parameters of the larger YOLOv8 models makes them difficult and expensive to deploy in production for a real-life autonomous driving agent.
﻿
Which model is best for a production environment with resource constraints?7
﻿
Although we can see that the largest YOLOv8 variants perform the best in terms of object detection metrics, we cannot conclusively say that the smaller variants cannot outperform them, given that the medium and small variants perform significantly better in terms of optimizing the detection losses. So, the question that we ask ourselves is
Is it possible to improve the performance of the smaller models somehow by choosing an optimal set of hyperparameters?
Tuning Hyperparameters using W&B SweepsIn order to improve the performance of the baseline model while not making it too big and expensive for real-time inference in a production environment, we need not only to select the best model but also the best set of hyperparameters to train it with. This, in spite of being quite a daunting task, was made quite easy for us by Sweeps, a scalable and customizable hyperparameter search and optimization engine by Weights & Biases. 
Sweeps makes it extremely easy for us to employ a Bayesian hyperparameter search method with the goal of maximizing the mean Average Precision (mAP) of the model on the validation dataset. Sweeps not only helped us arrive at the optimal set of hyperparameters for our final set of experiments but also provided us with additional insights with respect to the hyperparameter optimization process in terms of the correlations and the importance of different hyperparameters employed in the experiments.
﻿
﻿
Baseline experiments with the YOLOv8 models from Ultralytics using WandB46
﻿
Narrowing Down on the Best Backbone using SweepsWe will not be considering the hyperparameters related to the learning rate for the sweep. This is because, we run each experiment in the sweep for just five epochs, whereas, for the training of the final model, we would want to train longer with comparatively lower learning rates. Putting learning rate hyperparameters in the sweep would favor a higher learning rate in the comparatively shorter duration of the runs in the sweep; however, it might be detrimental if we want to train the model longer.
Based on the importance and correlations of different hyperparameters, we progressively apply filters to the different hyperparameter coordinates in the parallel-coordinates plot, which helps us to narrow down the potential candidates for a model suitable for a production environment with resource constraints down to 2 models: yolov8m and yolov5mu. Now, we will perform our final experiments to determine which model, we can take into production.
﻿
Training the Final Object Detection Model for ProductionSince we have narrowed down our preferred YOLOv8 models to 2 backbones, let's train each with the optimized set of hyperparameters from the sweep for longer and further analyze the results.
﻿
﻿
Analyzing the Metrics from the final training experiments.2
﻿
We can see from analyzing the metrics of the final training experiments that yolov5mu is the clear winner over yolov8m. It should also be noted that yolov5mu is performing better than yolov8m while being slightly smaller in terms of parameters and faster in terms of inference speed. Before selecting the yolov5mu as the model for production, let us analyze the predicted bounding boxes for both the models on the validation dataset.
﻿
Run set2
﻿
Metrics are half the TruthIf we analyze carefully, we will notice that although both the models are almost equivalent in their capability to detect objects, yolov5mu has a tendency
to confuse objects that look similar (for example, it confuses a truck for a bus in the second image, in the 15th image it misclassifies a truck as a car).
to miss important objects (for example, in the 9th image, it misses several bicycle riders)
However, if we examine the results corresponding to yolov8m, we can see that they are more consistent and reliable overall. Hence, it's a more likely candidate to be taken into production, even though it's marginally more expensive in terms of compute.
YOLOv8m Inference on Test DataLet us finally visualize the predictions of yolov8m, our most suitable candidate for production on the test split of our dataset.
﻿
Run set1
﻿
﻿
ConclusionIn this report, we explore an end-to-end object detection workflow for an autonomous vehicle use case using Ultralytics YOLOv8 models and WandB.
We use WandB Tables to visualize and explore our dataset and gain important insights about the distribution of data across various attributes and annotations.
We set up some baseline experiments using the YOLOv8 family of models and also demonstrated the advanced tracking and visualization features offered by the WandB integration for Ultralytics.
We use WandB Sweeps to not only tune the most optimized sets of hyperparameters but also to determine which YOLO models could give us the best performance for a production environment with constraints for compute resources.
We finally perform the final experiments by training the two models best suited for production according to the insights we gained from the sweep.
We closely analyze the performance and the results produced by the final models and make our decision as to which model we would like to take into production.
We also visualize the prediction results of the final model selected for production using WandB tables.
﻿
For similar content, check out the following reports on Fully Connected:
Training Semantic Segmentation Models for Autonomous Vehicles (A Step-by-Step Guide)
A short tutorial on leveraging Weights & Biases to train a semantic segmentation model for autonomous vehicles. 
Object Detection for Autonomous Vehicles (A Step-by-Step Guide)
Digging into object detection and perception for autonomous vehicles using YOLOv5 and Weights & Biases
Supercharging Ultralytics with Weights & Biases
A guide on using Weights & Biases with Ultralytics workflows for computer vision models
Debugging a Self-Driving RC Car
How I gained insights about my model in production
Lyft's High-Capacity End-to-End Camera-Lidar Fusion for 3D Detection
Learn how Lyft Level 5 combines multiple perception sensors in their self-driving automobile research
Scaling Out Motion Prediction for Autonomous Vehicles with L5Kit, Ray, and W&B 
In this tutorial, we'll show you how we easily organized and instrumented a prediction model for autonomous vehicle motion with W&B and scaled it out with Ray.
﻿
﻿
Add a comment