Skip to main content

How to fine-tune YOLO V9 on a custom dataset with W&B

Fine tuning one of the best detection models!
Created on July 26|Last edited on September 9
Object detection stands as a pillar of modern AI applications, from sophisticated surveillance systems to the cutting-edge technology driving autonomous vehicles. Fine-tuning a robust object detection model like YOLOv9 allows for tailoring its capabilities to specialized datasets, thereby enhancing its performance and applicability. This comprehensive guide will delve into the process of fine-tuning YOLOv9, utilizing W&B for logging, and executing inference with Weave.



What is YOLO?

YOLOv9 stands for You Only Look Once version 9, an exceptionally fast object detection framework utilizing a single convolutional neural network. Unlike traditional object detection systems that scan an image pixel-by-pixel, YOLOv9 processes the entire image in one go, resulting in significantly faster detection speeds.
Released in April 2024, YOLOv9 introduces groundbreaking techniques such as Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN) to address data loss and computational efficiency issues in computer vision tasks. These types of innovations are why YOLOv9 achieves outstanding real-time object detection performance, setting new benchmarks for precision and speed.


Introduction to YOLOv9 Fine-Tuning

Fine-tuning represents a pivotal step in adapting a pretrained model to align more closely with specific datasets, thereby optimizing its detection performance for particular tasks. YOLOv9, renowned for its blend of speed and precision, can be fine-tuned using additional data to further enhance its efficacy. This guide will meticulously walk through the steps required to fine-tune YOLOv9, emphasizing the integral roles of W&B for logging and Weave for running inference.

Prerequisites

Before embarking on this journey, ensure that you have the essential packages installed. The required packages include ultralytics for the YOLOv9 implementation, wandb for logging and tracking the training process, weave for seamless integration and inference operations, opencv-python for image processing, requests for handling HTTP requests, and pillow for image manipulation. These packages form the backbone of the entire fine-tuning and inference workflow, ensuring smooth and efficient operations.
pip install ultralytics wandb weave opencv-python requests pillow

Data Preparation

A cornerstone of effective model fine-tuning is the proper preparation of your dataset. For YOLOv9, this entails organizing your dataset in the YOLO format, which includes images and corresponding annotation files. Typically, your dataset structure should resemble the following: a directory named /data, within which you have separate subdirectories for images (/images) and labels (/labels). The images directory contains all your training and validation images, while the labels directory holds the annotation files. Each annotation file should detail the bounding box coordinates and class labels for objects present in the corresponding image. Properly formatted data is crucial as it directly influences the model's training and ultimately its performance.

data.yaml File

The data.yaml file is a configuration file that specifies the paths to your training and validation datasets, along with the class names. Here is an example of what a data.yaml file might look like:
train: ./path_to_your_tr_data
val: ./path_to_your_val_data
nc: 5
names: ['class1', 'class2', 'class3', 'class4', 'class5']
In the file, you should have the following keys:
train: Path to the directory containing training images.
val: Path to the directory containing validation images.
nc: Number of classes in the dataset.
names: List of class names.
Properly configuring this file is crucial as it directs the model to the correct data locations and defines the classes it needs to detect.

Images Directory

The images directory contains all your training and validation images. These images should be in a standard format such as JPEG or PNG and named consistently, for example:
/data/images/train/image1.jpg
/data/images/train/image2.jpg
/data/images/val/image1.jpg
/data/images/val/image2.jpg

Labels Directory

The labels directory holds the annotation files, with each image having a corresponding text file that contains the bounding box coordinates and class labels for objects present in the image. The naming convention of the annotation files should match the image files, with the difference being the file extension, for example:
/data/labels/train/image1.txt
/data/labels/train/image2.txt
/data/labels/val/image1.txt
/data/labels/val/image2.txt

Annotation File Format

Each annotation file contains lines of text where each line represents one object in the image. The format for each line is as follows:
class_id center_x center_y width height
class_id: An integer representing the class of the object. This should correspond to a predefined list of classes.
center_x: The x-coordinate of the center of the bounding box, normalized by the width of the image (i.e., a value between 0 and 1).
center_y: The y-coordinate of the center of the bounding box, normalized by the height of the image (i.e., a value between 0 and 1).
width: The width of the bounding box, normalized by the width of the image (i.e., a value between 0 and 1).
height: The height of the bounding box, normalized by the height of the image (i.e., a value between 0 and 1).
In order to create a dataset in this format, I recommend finding a labeler software. For this tutorial, I used Roboflow, but there are plenty of other open source options as well!

Fine-Tuning YOLOv9

Fine-tuning the YOLOv9 model involves loading the pretrained model, configuring it for your specific dataset, and initiating the training process. Here, W&B plays a vital role by providing an interactive interface for monitoring the training process, visualizing metrics, and maintaining comprehensive records of the training runs. Here is the script I used for fine-tuning:
from ultralytics import YOLO
import wandb

# Load a pretrained YOLOv9 model
model = YOLO("yolov9c.pt")
model_path = "yolov9c_finetunedv2.pt"
# Train the model on your dataset
# results = model.train(data="./data.yaml", epochs=100, imgsz=640)
results = model.train(
data="./data.yaml",
epochs=100,
imgsz=640,
project="yolo_training", # wandb project name
name="yolov9_finetune" # wandb run name
)

# Save the fine-tuned model
model.save(model_path)

wandb.init("yolo_training")
wandb.log_model(model_path)
wandb.finish()
Firstly, the pretrained YOLOv9 model is loaded using the YOLO class from the ultralytics library. This model serves as the starting point for fine-tuning. Next, W&B is initialized to log the entire training process. By creating a new project named yolo_training, W&B allows you to track progress, visualize losses, and compare different training runs. The actual training of the model on your custom dataset is initiated by specifying the path to your dataset configuration file (e.g., data.yaml), the number of training epochs, and other relevant parameters such as image size. Throughout the training, W&B continuously logs various metrics, providing insights into the model’s performance and facilitating the identification of any issues that might arise. Upon completing the training, the fine-tuned model is saved and logged to W&B, ensuring that all aspects of the process are thoroughly documented and easily accessible.
After running this script, you will see logs appear inside W&B! Here's the logs for my run!


Run: yolov9_finetune11
1


Model Registry

Next we will add our logged model to the W&B model registry. In order to do so, we will first go inside our project and access the artifacts section, shown below:


Next, click the 'Link to Registry' Button in order to add the model to your model registry!

After adding your model, you can navigate to the Model Registry page by clicking the 'Model Registry" button in the top left of the artifacts page as shown below:


After navigating to the model registry, you can click the model you added, and then click the 'usage tab,' which will show a code snippet on how to use the model:

In order to download your model, you can simply copy this snippet and replace it in the code down below!

Running Inference with Weave

Weave facilitates the integration of inference operations with W&B, streamlining the process of running inferences on images and visualizing the results. The following section demonstrates how to utilize Weave to run inference on an image using the fine-tuned YOLOv9 model.
Here's a script for running inference:
from ultralytics import YOLO
import requests
from PIL import Image
from io import BytesIO
import os
import wandb
import weave

# Initialize Weave and wandb with the same project name
project_name = "yolo_training"
weave.init(project_name)
run = wandb.init(project=project_name)

# Use the specified artifact
artifact = run.use_artifact('byyoung3/model-registry/yolo:v0', type='model')
artifact_dir = artifact.download()

# Define the path to the downloaded model
model_path = os.path.join(artifact_dir, "best.pt")

# Load the pretrained YOLOv9 model
model = YOLO(model_path)

# Function to run inference on a single image
@weave.op
def run_inference(image: Image.Image) -> dict:
try:
# Save the image locally for prediction
local_image_path = 'temp_image.jpg'
image.save(local_image_path)

# Run the YOLO model on the image with adjusted NMS threshold
results = model.predict(local_image_path, conf=0.7, iou=0.2)

# Draw bounding boxes on the image and save the result
results[0].save(local_image_path)
result_image = Image.open(local_image_path)

# Extract predictions
predictions = []
for box in results[0].boxes:
class_id = int(box.cls)
class_name = results[0].names[class_id]
confidence = box.conf.item()
coordinates = box.xyxy.tolist()
predictions.append({
'class': class_name,
'confidence': confidence,
'coordinates': coordinates
})

# Prepare the results
result_data = {
'result_image': result_image,
'predictions': predictions
}

return result_data
except Exception as e:
return {'error': str(e)}

# Download the image from the URL
image_url = "https://i.ytimg.com/vi/7FmHydF9Gvg/hqdefault.jpg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))

# Run inference using the downloaded image
inference_result = run_inference(image)
print(inference_result)


This code initializes Weave for a project named "yolo_training" to facilitate model tracking and logging. It then retrieves a specific YOLOv9 model artifact from W&B, downloads it, and loads the model for inference. The run_inference function accepts a PIL image, saves it locally, and runs the YOLO model to detect objects with specified confidence and IoU thresholds. It returns the processed images, along with the result metadata, which contains the class, confidence score, and coordinates of each detected object.
If we navigate to our project page, we will see a button for "traces" which is where we can view our Weave data! Simply click this button and you will see the following screen!

This screen shows calls to your run_inference function. You can click any of the specific calls of the function, and you will see the following visualization of the function, showing the inputs and outputs to the function!



Conclusion

Fine-tuning YOLOv9 with custom data and utilizing W&B for logging presents a powerful approach for enhancing object detection models. The integration with Weave further simplifies the process of running inferences and visualizing results. By following this comprehensive guide, developers and researchers can effectively fine-tune YOLOv9 models and leverage W&B and Weave to streamline their workflows, achieving superior performance and gaining valuable insights into their models.


Iterate on AI agents and models faster. Try Weights & Biases today.