[Uma copy] Object Detection using YOLOv8: An End-to-End Workflow
A comprehensive guide to building an object detection workflow using Ultralytics YOLOv8 and Weights & Biases.
Created on December 3|Last edited on December 3
Comment
In this report, we’ll take you through an object detection workflow for autonomous vehicles with Weights & Biases. More specifically, you'll learn how to create a baseline object detection model using the YOLOv8 models from Ultralytics, improve it with continued experimentation (including selecting our highest performing backbone architecture and tuning our hyperparameters), analyze it with some common metrics, and identity which candidate model is our best performer, all in W&B.
What We'll Be Covering
Exploring our Datasets using TablesEstablishing The Learning Objective Of Our YOLOv8 ModelBaseline ExperimentsComparing YOLOv8 Flavors in W&BTuning Hyperparameters using W&B SweepsNarrowing Down on the Best Backbone using SweepsTraining the Final Object Detection Model for ProductionMetrics are half the TruthYOLOv8m Inference on Test DataConclusion
Exploring our Datasets using Tables
The Berkeley Deep Drive 100K Dataset (BDD100K) is a collection of video data for heterogeneous multitask learning. Unsurprisingly, it contains 100,000 videos from more than 50,000 individual rides. That variety is key as it allows us to have diverse scenes we want our model to understand–city streets, rural backroads, and highways, as well as a variety of weather and light conditions. BDD100K can be used for a sizeable portion of typical AV modeling (think lane detection, instance segmentation, etc.).
But today, we’ll be using it for model detection.

First, let’s get our data. To do this, we’ll use W&B Artifacts, which makes it easy and convenient to store and version our dataset. Creating a new version of the dataset and fetching a particular version takes only a couple of lines of code:
# Create Artifactwith wandb.init():artifact = wandb.Artifact('bdd100k-yolov5', type='dataset')artifact.add_dir(dataset_path)wandb.log_artifact(artifact)`# Fetching Artifactwith wandb.init():artifact = run.use_artifact('av-team/bdd100k-perception/bdd100k-yolov5:latest', type='dataset')artifact_dir = artifact.download()
We're hosting a subset of the BDD100K dataset with object-detection annotations converted to a format compatible with training using the YOLOv5 framework by Ultralytics. Here's how that dataset looks as an Artifact:
bdd100k-ultralytics-format
Direct lineage view
Some nodes are concealed in this view - Break out items to reveal more.
We can also use Tables and Weave in our W&B workspace to visualize and explore our images and segmentation labels. Specifically, we'll dig into our subset of data to find out exactly what's in there. We'll quickly analyze the frequency distribution of the annotation labels using Custom Plots by Weights & Biases. This can be valuable for all sorts of things–finding over- or under-represented classes, for example. In fact, below, you'll see our model won't do well with trains as we have no examples in this particular selection.
These Tables are fully interactive, so feel free to explore. For example, you can click on any of the images in the column labeled Image-BBox column below and toggle both the bounding box and semantic segmentation corresponding to each image (we have a quick example gif at the end of this section as well).
Below are graphs of class frequencies, you'll see a few Tables. The first contains our dataset, with object detection and segmentation labels, as well as annotations for weather, time of day, and scene. Below that, you'll see BDD100K grouped by weather conditions, scene, and time of day.
Analysis and Exploration the BDD100K-Dataset using Weave and Tables
0

Exploring Annotations Interactively
Establishing The Learning Objective Of Our YOLOv8 Model
Now that we’ve examined our dataset and its distribution, let’s look into what our YOLOv8 object detection model is supposed to learn. Here, we're interested in bounding box annotations corresponding to all objects of interest (such as vehicles, pedestrians, obstacles, etc.) present in a given frame of a video or camera feed. Our model not only needs to predict all the bounding box coordinates but also label them as one of the following classes:
- bike 🚲
- bus 🚎
- car 🚗
- motorcycle 🏍
- person 🧍
- rider 🚴♀️
- traffic light 🚦
- traffic sign 🛑
- train 🚝
- truck 🚚
Baseline Experiments
For establishing our baseline experiments, we decided to use the YOLOv8 family of foundational models for computer vision developed by Ultralytics. We performed the baseline experiments using all the variants of the YOLOv8 family of models using the default set of hyperparameters for five epochs each.
We will be using the Ultralytics integration for Weights & Biases to not only track our experiments but also debug the implementation correction of our pipeline and the annotation format of our dataset and check the improvement (or lack thereof) in the performance of our models on the validation set while training the model. If you want to learn more about the advanced features this integration offers, feel free to check out the following report:
A baseline object detection workflow with Ultralytics and WandB
0
Comparing YOLOv8 Flavors in W&B
Let's compare which of our YOLOv8 models performs best. To do so, we'll look at a variety of common metrics, losses, and model parameter stats, such as the following:
- Precision attempts to answer the question, "What proportion of positive identifications was correct?" For a model that produces no false positives, the precision would be 1.0.
- Recall attempts to answer the following question, "What proportion of actual positives was identified correctly?" For a model that produces no false negatives, the precision would be 1.0.
- Average Precision (or AP) is a way to summarize the precision-recall curve into a single value representing the average of all precisions. Mean Average Precision or mAP is basically the mean of the average precision of all the classes. It is a widely used metric for evaluating the accuracy of object detection models. It measures how well a model can accurately locate and classify objects in an image. It calculates the average precision (AP) for each class or category of objects and then computes the mean of these AP scores.
- Box Loss is a component of the loss function that measures the error in predicting the coordinates of the bounding boxes that surround the detected objects. Its purpose is to penalize the model when it makes errors in predicting the precise location and size of the bounding boxes.
- Classification Loss is another component of the loss function in object detection. It measures the error in predicting the class labels or categories of objects present in the bounding boxes.
- Detection Focal Loss is a variant of the classification loss designed to address class imbalance issues in object detection tasks. This helps the model focus on improving the accuracy of challenging object detections.
Baseline experiments with the YOLOv8 models from Ultralytics using WandB
5
Based on the results from the baseline experiments, we can see that the larger the model is, the better it performs concerning the metrics for object detection. However, the higher number of parameters of the larger YOLOv8 models makes them difficult and expensive to deploy in production for a real-life autonomous driving agent.
Which model is best for a production environment with resource constraints?
7
Although we can see that the largest YOLOv8 variants perform the best in terms of object detection metrics, we cannot conclusively say that the smaller variants cannot outperform them, given that the medium and small variants perform significantly better in terms of optimizing the detection losses. So, the question that we ask ourselves is
Is it possible to improve the performance of the smaller models somehow by choosing an optimal set of hyperparameters?
Tuning Hyperparameters using W&B Sweeps
In order to improve the performance of the baseline model while not making it too big and expensive for real-time inference in a production environment, we need not only to select the best model but also the best set of hyperparameters to train it with. This, in spite of being quite a daunting task, was made quite easy for us by Sweeps, a scalable and customizable hyperparameter search and optimization engine by Weights & Biases.
Sweeps makes it extremely easy for us to employ a Bayesian hyperparameter search method with the goal of maximizing the mean Average Precision (mAP) of the model on the validation dataset. Sweeps not only helped us arrive at the optimal set of hyperparameters for our final set of experiments but also provided us with additional insights with respect to the hyperparameter optimization process in terms of the correlations and the importance of different hyperparameters employed in the experiments.
Baseline experiments with the YOLOv8 models from Ultralytics using WandB
46
Narrowing Down on the Best Backbone using Sweeps
We will not be considering the hyperparameters related to the learning rate for the sweep. This is because, we run each experiment in the sweep for just five epochs, whereas, for the training of the final model, we would want to train longer with comparatively lower learning rates. Putting learning rate hyperparameters in the sweep would favor a higher learning rate in the comparatively shorter duration of the runs in the sweep; however, it might be detrimental if we want to train the model longer.
Based on the importance and correlations of different hyperparameters, we progressively apply filters to the different hyperparameter coordinates in the parallel-coordinates plot, which helps us to narrow down the potential candidates for a model suitable for a production environment with resource constraints down to 2 models: yolov8m and yolov5mu. Now, we will perform our final experiments to determine which model, we can take into production.
Training the Final Object Detection Model for Production
Since we have narrowed down our preferred YOLOv8 models to 2 backbones, let's train each with the optimized set of hyperparameters from the sweep for longer and further analyze the results.
Analyzing the Metrics from the final training experiments.
2
We can see from analyzing the metrics of the final training experiments that yolov5mu is the clear winner over yolov8m. It should also be noted that yolov5mu is performing better than yolov8m while being slightly smaller in terms of parameters and faster in terms of inference speed. Before selecting the yolov5mu as the model for production, let us analyze the predicted bounding boxes for both the models on the validation dataset.
Run set
2
Metrics are half the Truth
If we analyze carefully, we will notice that although both the models are almost equivalent in their capability to detect objects, yolov5mu has a tendency
- to confuse objects that look similar (for example, it confuses a truck for a bus in the second image, in the 15th image it misclassifies a truck as a car).
- to miss important objects (for example, in the 9th image, it misses several bicycle riders)
However, if we examine the results corresponding to yolov8m, we can see that they are more consistent and reliable overall. Hence, it's a more likely candidate to be taken into production, even though it's marginally more expensive in terms of compute.
YOLOv8m Inference on Test Data
Let us finally visualize the predictions of yolov8m, our most suitable candidate for production on the test split of our dataset.
Run set
1
Conclusion
- In this report, we explore an end-to-end object detection workflow for an autonomous vehicle use case using Ultralytics YOLOv8 models and WandB.
- We use WandB Tables to visualize and explore our dataset and gain important insights about the distribution of data across various attributes and annotations.
- We set up some baseline experiments using the YOLOv8 family of models and also demonstrated the advanced tracking and visualization features offered by the WandB integration for Ultralytics.
- We use WandB Sweeps to not only tune the most optimized sets of hyperparameters but also to determine which YOLO models could give us the best performance for a production environment with constraints for compute resources.
- We finally perform the final experiments by training the two models best suited for production according to the insights we gained from the sweep.
- We closely analyze the performance and the results produced by the final models and make our decision as to which model we would like to take into production.
- We also visualize the prediction results of the final model selected for production using WandB tables.
Training Semantic Segmentation Models for Autonomous Vehicles (A Step-by-Step Guide)
A short tutorial on leveraging Weights & Biases to train a semantic segmentation model for autonomous vehicles.
Object Detection for Autonomous Vehicles (A Step-by-Step Guide)
Digging into object detection and perception for autonomous vehicles using YOLOv5 and Weights & Biases
Supercharging Ultralytics with Weights & Biases
A guide on using Weights & Biases with Ultralytics workflows for computer vision models
Debugging a Self-Driving RC Car
How I gained insights about my model in production
Lyft's High-Capacity End-to-End Camera-Lidar Fusion for 3D Detection
Learn how Lyft Level 5 combines multiple perception sensors in their self-driving automobile research
Scaling Out Motion Prediction for Autonomous Vehicles with L5Kit, Ray, and W&B
In this tutorial, we'll show you how we easily organized and instrumented a prediction model for autonomous vehicle motion with W&B and scaled it out with Ray.
Add a comment