Object detection and tracking with YOLOv8

In this article, we'll cover the basics of YOLOv8, including setting up your machine for YOLOv8, and then dive into creating a custom object tracker with YOLOv8.
Mukilan Krishnakumar
Created on April 13|Last edited on May 28
Comment
It's no secret that YOLO models have revolutionized the field of Computer Vision. They are state-of-the-art models that give unparalleled performance amongst all the other models. YOLOv8 is the latest of the YOLO series of models. 
Understanding the new features and improvements made in YOLOv8 can be challenging, so to help make these ideas concrete, we'll explain them through a story. In our case, we'll set the scene by solving problems through the perspective of Laxman, a novice forest ranger. 
Throughout the article, there will be a few exercises to help improve your understanding. My recommendation is to digest this piece in multiple reading and coding sessions. Good luck!
If you want to jump right to the tutorial, click here.
💡
Here's what we'll be covering: 
Table of ContentsTable of ContentsOur YOLOv8 projectIntroduction to YOLOv8What is YOLOv8?Is YOLOv8 Open Source?Who is the Author of YOLOv8?When Was YOLOv8 Released?What are the Key Architectural Specs and Results?What Are the Enhancements Over Previous Versions?Setting up your machine for YOLOv81. Setting up Conda Environment2. Installing PyTorch3. Installing Ultralytics4. Selenium5. SupervisionChat with NarayananWhat tasks can YOLOv8 be used for?1. Classification2. Object Detection3. SegmentationHomestretchStep 1: Data CollectionStep 2: Data AnnotationStep 3: Experiment Tracking With W&BStep 4: Training YOLOv8Step 5: Tracking WildlifeSerene MalgudiConclusionCredits and Recommendations
﻿
﻿
Let's get going! 
Our YOLOv8 projectMalgudi, a picturesque town surrounded by forests, faces frequent wild animal encounters. Laxman, a nature-loving boy, wants to prevent harm to both animals and residents after wild elephants raided his family's farm. He proposes installing CCTVs to detect stray animals and alert the town.
Seeking guidance from his developer friend Narayanan, Laxman begins exploring YOLO models, focusing on YOLOv8, to bring his plan to life.
Introduction to YOLOv8
What is YOLOv8?YOLOv8 is from the YOLO family of models and was released on January 10, 2023. YOLO stands for You Only Look Once, and this series of models are thus named because of their ability to predict every object present in an image with one forward pass. 
The main distinction introduced by the YOLO models was the framing of the task at hand. The authors of the paper reframed the object detection task as a regression problem (predict the bounding box coordinates) instead of classification. 
YOLO models are pre-trained on huge datasets such as COCO and ImageNet. This gives them the simultaneous ability to be the Master and the Student. They provide highly accurate predictions on classes they are pre-trained on (master ability) and can also learn new classes comparatively easily (student ability). 
Master and Student YOLOs, Image by Author
YOLO models are also faster to train and have the ability to produce high accuracy with smaller model sizes. They can be trained on single GPUs, making them more accessible to developers like us. 
YOLOv8 is the latest iteration of these YOLO models (as of early 2023). It has undergone a few major changes from its ancestors, such as anchor-free detection, the introduction of C3 convolutions, and mosaic augmentation. 
Is YOLOv8 Open Source?YOLOv8 is an Open Source SOTA model built and maintained by the Ultralytics team. It is distributed under the GNU General Public License, which authorizes the user to freely share, modify and distribute the software. The YOLOv8 community is vibrant and ever-growing. 
Who is the Author of YOLOv8?YOLOv8 is written and maintained by the Ultralytics team. YOLO models were initially the creation by Joseph Redmon, a Computer Scientist. He cycled through three iterations of YOLO, with the third one being YOLOv3, all written in Darknet Architecture. 
Glenn Jocher shadowed YOLOv3 in PyTorch along with a few minor changes and named it YOLOv5. YOLOv5’s architecture was then modified to develop YOLOv8.
When Was YOLOv8 Released?YOLOv8 was officially released on January 10th, 2023. As of writing, it is still under active development.
What are the Key Architectural Specs and Results?Currently, there is no official paper out yet. But thanks to GitHub user RangeKing, who painstakingly went through the yolov8-p2.yaml file, we have the following illustration:
Model Architecture of YOLO, Image by GitHub user RangeKing
Visit this link to see a clearer illustration by RangeKing. This visual is recognized and approved by the Ultralytics team so we can proceed without caution. 
We can also visualize the YOLOv8 architecture on our own by converting it to ONNX format. ONNX stands for Open Neural Network Exchange. It is an open format that is used to represent machine learning models. It can be thought of as a universal representation of ML models, as it is common for any model. Albeit the model code is written in PyTorch or TensorFlow or any other framework. 
Converting any YOLO model to ONNX is easy. Copy the following code block and run it:
!pip install ultralytics
﻿
from ultralytics import YOLO
model = YOLO('yolov8m.pt')
model.export(format = "onnx")
We can now take this .onnx file and upload it in Netron App. Netron is an application that visualizes neural networks. 
Here is the YOLOv8n visualized:
Netron App Visualizes yolov8m  , Image by Author 
Now, we can compare the changes in YOLOv8 with its predecessor, YOLOv5. 
There are two main changes:
Anchor-Free Detection
Mosaic Augmentation
Let's look at each of these changes in detail:
1. Anchor-Free DetectionTo understand anchor-free detection, we first need to understand anchor boxes. 
Anchor boxes solved a major problem within object detection. Before anchor boxes, an object is assigned to a grid cell that contains the given object’s midpoint. If two objects had the same center point, constructing bounding boxes and allocating them to individual classes become quite tricky. 
For example, consider the situation in which a human and a horse have the same center point. How should we construct a bounding box in such a case? 
Same Center Objects, Image by Author
Anchor boxes can be thought of as cookie-cutter templates. Consider the case in which we have two anchor boxes: Anchor Box 1 and Anchor Box 2.  
Now, we will check which in the list of anchor boxes has the highest IoU (overlap) with the ground truth bounding box and assign that anchor box to the class. 
As visualized below, Anchor Box 1 is useful for horizontally elongated figures such as horses, and Anchor Box 2 is useful for vertically elongated figures such as humans. 
Anchor Boxes and Objects, Image by Author
Anchor Boxes generally improved training by increasing mAP. They were incorporated in previous YOLO models. In YOLOv8, the architecture moved away from Anchor Boxes for a few reasons:
Lack of Generalization: Training with prebuilt Anchors makes the model rigid and hard to fit on new data.
Lack of Proper Anchor Boxes in Irregularity: Irregularities cannot be mapped clearly with polygon anchor boxes.
2. Mosaic Data AugmentationDuring training, YOLOv8 does many augmentations to training images. One such augmentation is mosaic data augmentation. 
Mosaic data augmentation is a simple augmentation technique in which four different images are stitched together and fed into the model as input. This makes the model learn the actual objects from different positions and in partial occlusion. 
Mosaic Augmentation (Sneak Peek into Future Training), Image by Author
Performing Mosaic Data Augmentation is shown to reduce performance, so it was switched for the last 10 epochs. 
What Are the Enhancements Over Previous Versions?As there are no official results from the paper, we are going to go through the official YOLO comparison plot from the repository. 
Comparing different YOLO version, Image from Ultralytics YOLOv8 repo
As we can observe from the plot, YOLOv8 has more parameters than its predecessors, such as YOLOv5, but fewer parameters than YOLOv6. It offers about 33% more mAP for n-size models and generally a greater mAP across the board. 
From the second graph, we can observe faster inference time amongst all the other YOLO models. This is understandable and elegant. 
Within YOLOv8, we have different model sizes such as yolov8- n - nano, s - small, m - medium, l - large, and x - extra large. 
Comparing Model Sizes, Image from Ultralytics YOLOv8 repo
The model size is linearly proportional to mAP and inversely proportional to inference time. Bigger models take more inference time to accurately detect objects with higher mAP. Smaller models have faster inference time but have comparatively lesser mAP. Bigger models are better if we have less data. Smaller models are more efficient if we have less space (edge scenarios).
Setting up your machine for YOLOv8I have created this GitHub repository which should be sufficient to handle the complete Wildlife Tracking Project.
Download the zip and extract it. Navigate to the folder and install the following libraries and configurations.
1. Setting up Conda EnvironmentFirst, we need to create a custom environment for this project. 
conda create -n yolo_env python==3.8
Then, activate this environment.
conda activate yolo_env
2. Installing PyTorchPyTorch is the underlying framework on top of which Ultralytics is built. PyTorch makes it easy to switch training from CPUs to GPUs. It does a whole lot of other things, but we are going to primarily use it to switch training to GPU. 
We will install PyTorch by running the following command.
conda install pytorch torchvision torchaudio -c pytorch
3. Installing UltralyticsPreviously for using YOLO models, we had to download the YOLO repo and run our code from that directory. Now, we have Ultralytics packaging YOLO models as pip packages. 
We can easily install them by running the following command.
pip install -U ultralytics
4. SeleniumSelenium is a browser automation library. We will be using Selenium to scrape images off the internet. 
pip install selenium
5. SupervisionSupervision is a package that has a lot of repetitive computer vision utilities built into it. We will be using Supervision to create a Tracker Line, which is used to count the number of objects crossing the line. 
Note: I have used supervision==0.3 throughout the project. Upgrading to the latest versions could potentially break the code. 
We can install supervision through pip
pip install supervision==0.3
Now that we've set up the local environment, it's time to visit Narayanan!
Chat with NarayananAfter setting up his local computer for performing Object Tracking, Laxman visited Narayanan’s house. While drinking tea (sourced from the hills of Ooty), they discussed Laxman’s project. 
Laxman has some theoretical understanding of YOLO models, but he lacks coding experience. To familiarize himself with coding with YOLOv8, Narayanan gave him 3 tasks:
Create an object classifier that can look through an image and identify classes present in the image. 
Create an object detector that can parse through a video and draw bounding boxes around objects present in the video.
Create an object segmentation model which can look through an image and segment different objects present in the image. 
Before Laxman could dive into those tasks, he first needed to know the types of tasks YOLOv8 is used for. 
What tasks can YOLOv8 be used for?YOLOv8 can be used for 3 major computer vision tasks. 
1. ClassificationClassification is a simple task in which our model needs to identify and output a single class that is present predominantly in the input image. 
The output of a classification task is a class index and a confidence score. 
In general, classification is useful only when we need to identify if a certain class is present in our input image. Again, one important point to note is the fact that we cannot determine the location of the object, only its presence. 
Creating a classifier is easy with YOLOv8. All we need to do is add the -cls suffix to our desirable model of YOLOv8. 
YOLOv8 comes in different sizes, such as yolov8n (Nano), yolov8s (Small), yolov8m (Medium), yolov8l(Large), and yolov8x (Extra Large).
It is able to perform classification really well because of its pretraining on the ImageNet dataset (a huge dataset containing millions of images).
Now, let's create a classifier with the yolov8n configuration.
from ultralytics import YOLO
﻿
# Load the model
model = YOLO('yolov8n-cls.pt')
﻿
# Classification result
result = model('SOURCE_PATH')
Exercise 1Load the classifier version of the yolov8m model and perform classification on the following image. The image is available in the Assets folder from the GitHub repository. 
Hopefully, it was an easy task. Here’s the solution:
from ultralytics import YOLO
﻿
# Load the model
model = YOLO('yolov8m-cls.pt')
﻿
# Classification result
result = model('../Assets/Bird.jpg', save = True, project = "../Results/")
Classification of Hummingbird, Image by Author
2. Object DetectionObject Detection is the evolution task of Classification. In Object Detection, we need to identify different classes present in the image and detect their exact location. 
The location of such objects is visually shown through Bounding Boxes. 
YOLOv8 models are pretrained on the COCO dataset (another huge image dataset). It can perform Object Detection out of the box.
There is no need for any suffixes.
Here’s an example of performing Object Detection on an image:
from ultralytics import YOLO
﻿
# Load the model
model = YOLO('yolov8n.pt')
﻿
# Object Detection result
result = model('SOURCE_PATH')
Exercise 2Load the Object Detection version of the yolov8s model and test it out on the DogVid video available in the Assets folder of the GitHub repository. 
Here’s the solution:
from ultralytics import YOLO
﻿
# Load the model
model = YOLO('yolov8n.pt')
﻿
# Object Detection result
result = model('../Assets/DogVid.mp4', save = True, project = "../Results/")
The generated video is available in the GitHub repo under Results/ directory.
3. SegmentationSegmentation is on the next rung after Object Detection. In Object Detection, we found the location of objects and approximated their location through Bounding Boxes. Segmentation is the task in which we identify individual pixels which belong to an object. 
It is much more precise than Object Detection and has a huge range of applications, such as Medical Imaging, Satellite Imaging, etc. 
It is easier to perform Segmentation with YOLOv8. All you need to do is add the -seg suffix.
from ultralytics import YOLO
﻿
# Load the model
model = YOLO('yolov8n-seg.pt')
﻿
# Segmentation result
result = model('SOURCE_PATH')
Exercise 3Load the segmentation model and segment the Apples.jpg present in the Assets folder of the GitHub repository.
Here’s the solution:
from ultralytics import YOLO
﻿
# Load the model
model = YOLO('yolov8m-seg.pt')
﻿
# Segmentation result
result = model('../Assets/Apples.jpg', save = True, project = "../Results/")
Segmentation of Apples, Image by Author
HomestretchLaxman, after getting hands-on with YOLOv8, is finally ready to work on his Wildlife Tracker. He sets up a face-to-face meeting with Narayanan as quickly as possible.
The day of their meeting is here, and Laxman is excited. Narayanan, in the meantime, created a step-by-step architecture for Laxman to follow.
The guide has every step marked clearly. Laxman takes this architecture home and gets started on building the very first Wildlife Tracker. 
Here’s the architecture:
Custom Tracker Architecture, Image by Author
Step 1: Data CollectionData Collection is the first stage in any ML project. Remember that the model’s quality depends on the type of data we collect. 
There are many easy ways to collect images, such as the Download All Images Plugin. It could generally make your life easier to download images from the internet, but I have this tendency to make stuff really challenging for myself.
We are going to use the Selenium library to scrape images off the internet. 
Navigate to the 1_Data_Collection directory in the project GitHub repository. You can just run the data_collection.py. This will create an images directory in another subfolder - 2_Data_Annotation. 
I modified the code made by Ivan Goncharov. His code has some deprecated methods, but his tutorial is amazing. It is highly recommended to go through it to understand web scraping with Selenium. 
I am going to explain the changes I made in his code, so you can make modifications if you want to customize it for some other class of objects.
All you need to do to customize to your very own class is to understand the scrape_images method. It takes in three arguments: 
search_term: Object Class you want to scrape e.g., Indian Tiger. We need to remove the space and add + in between. So the search_term becomes indian+tiger
number_of_images: Number of images you want from that object class. 
starting_number: Indexing should not overwrite previously downloaded images. We can set starting_number to add to previously scraped data. Example: If we have 10 images of Tiger (starting with index 0), and we wanted to scrape Lion. We have to set the starting_number to 10. 
After running the script, we should have about 600 images in the 2_Data_Annotation/images folder. Some of these images could be corrupt. We will remove them in upcoming sections.
Step 2: Data AnnotationI took the liberty of integrating Ivan’s ModifiedOpenLabelling into my sub-directory. I have modified some of the code in the train_test_split.py, but the run.py remains the same as the original. 
Navigate to the 2_Data_Annotation subdirectory and keep it as the working directory in your terminal.
I have already updated the class_list.txt to fit our list of animals. You can change this if you want to Detect some other class of animals.
Our class_list.txt is as follows:
bengal_tiger
indian_elephant
indian_rhinoceros
indian_bison
indian_leopard
asiatic_lion
Now, type the following command to start annotating. 
python3 train.py
We need to know a few main keys to operate the UI:
Press A  to go to the previous image
Press D to go to the next image
Press W to go to the previous class
Press S to go to the next class
Press Q to quit
Left-click to fix the left-top corner of the bounding box, and click again to fix the right-bottom corner of the bounding box. Right-click inside the bounding box to delete mistakes.
Now, we are going to run the train_test_split.py using the following command:
python3 train_test_split.py
This would have created the following project structure in the main project directory.
Project Structure of Images and Labels, Image by Author
Navigate to the main project directory by typing the following command:
cd ..
I have already created a Custom YAML file. You can alter this if you have custom classes. 
It can be viewed by opening the wildlife_dataset.yaml
# class names
names : ['indian_tiger', 'indian_elephant', 'indian_rhinoceros', 'indian_bison', 'indian_leopard', 'asiatic_lion']
﻿
#number of classes
nc: 6
﻿
# location of train and test data
train: ".data/images/train"
val: ".data/images/val"
Step 3: Experiment Tracking With W&BWeights and Biases (W&B) is a great tool to keep track of all your ML experiments. By using W&B Artifacts, we can track models, datasets, and results of each step of the ML pipeline. 
One easy explanation of Artifacts is this. Artifacts are both inputs and outputs of a run. Hence, Artifacts cover a wide ground. 
I personally would love to see how my training image is annotated in a tabular form. If I want, I can version this dataset and note changes to the dataset. 
Navigate to the 3_Experiment_Tracking_with_W&B folder and execute the upload_dataset.py. 
The code is as follows:
import wandb
import os
﻿
config = {
    "project": "wildlife-yolov8",
    "num_of_classes": 6 
}
run = wandb.init(project = config["project"], config = config)
﻿
artifact = wandb.Artifact(
    name = "yolov8-data",
    type = "dataset"
)
﻿
artifact.add_dir("../data/")
wandb.log_artifact(artifact)
My dataset is available in ../data/ folder. This code will initialize a wandb.Run and log our dataset using wandb.Artifact, add_dir, and log_artifacts methods. 
I am not going to change anything from this dataset, so I am going to upload the primary dataset through a workaround. In the next patch, we can directly upload our dataset without running this workaround.
You can skip this entire section and only execute the workaround by running the following command:
python3 visualize_dataset.py
If you are curious about the methods through which I am able to visualize the dataset, then read on. If not, skip to the next section. 
3.1 Importing LibrariesWe are going to import wandb to track experiments and upload the dataset. We are going to import os to perform file selection and manipulation. 
import wandb
import os
3.2 Utilities and ConfigurationWe need to set location variables for training images, training labels, validation images, and validation labels. We should also set a class_list dictionary, which contains animals along with their class numbers. 
PATH_TRAIN_IMAGES = "data/images/train"
PATH_TRAIN_LABELS = "data/labels/train"
PATH_VAL_IMAGES = "data/images/val"
PATH_VAL_LABELS = "data/labels/val"
﻿
class_labels = {0 : 'indian_tiger', 1 : 'indian_elephant', 2 : 'indian_rhinoceros' , 3 : 'indian_bison', 4 : 'indian_leopard' , 5 : 'asiatic_lion'}
3.3 Initializing W&B RunAny ML experiment can be tracked with a run. A run is initialized with Weights and Biases by wandb.init. We will first create a configuration object which contains our project name and the number of animal classes. You can add more metadata to this config. Then, we will initialize the run with wandb.init. 
config = {
    "project": "wildlife-yolov8",
    "num_of_classes": 6 
}
run = wandb.init(project = config["project"], config = config)
3.4 Helper Functions for Bounding Box ManagementThe first helper function we will be creating is the box_dict_maker function. It takes in the no_of_times and bounding_box lists. It essentially counts the number of bounding boxes in any image and returns the location of the bounding box parsed as a configuration dictionary. 
This configuration dictionary is very important. To log any bounding box in W&B, we need to pass in two main key values: box_data and class_labels. Class_Labels will be constant for every wandb.Image, so we will shift our focus to box_data. box_data is a list of dictionaries, which is an important distinction. Every single dictionary inside box_data will be considered a bounding box. If we have multiple bounding boxes, we need to send in multiple configuration dictionaries. We are uploading training data. We need to have three main sub-key values: position, class_id, and box_caption. 
YOLO models use three main values for representing a bounding box:  
middle: Coordinates the center of the bounding box
 width: Width of Bounding Box
 height: Height of Bounding Box
We will be parsing through the bounding box files, and we will split this data into configuration key values and upload the data as wandb.Image
def box_dict_maker(no_of_times, bounding_box):
    box_list = []
    class_num = 0
    for n in range(no_of_times):
        intermediate = {
                    "position": {
                    "middle" : [float(bounding_box[5*n + 1]),float(bounding_box[5*n + 2])],
                    "width" : float(bounding_box[5*n + 3]),
                    "height" : float(bounding_box[5*n +4]),
                    },
                    "class_id" : int(bounding_box[5*n + 0]),
                    "box_caption": class_labels[int(bounding_box[5*n + 0])],
        }
        class_num = int(bounding_box[5*n + 0])
        box_list.append(intermediate)
    return (class_num, box_list)
Now, we will create a function that will parse through the label file and determine the number of bounding boxes present in the image. This will be sent as input along with the bounding_box argument to the box_dict_maker function. 
def bounding_box_fn(file_name):
    box = []
    box_dict = {}
    with open(file_name) as f:
        for w in f.readlines():
            for l in w.split(" "):
                box.append(l)
    no_of_times = int(len(box) / 5)
    class_num, box_list = box_dict_maker(no_of_times, box)
    final_dict = {
        "ground_truth" : {
            "box_data" : box_list,
            "class_labels" : class_labels
        }
    }
    return (class_num, final_dict)
Finally, we will create an execute function, which will take the path of the image folder and label the folder along with the experiment name. This function will individually find images and their corresponding labels and call the bounding_box_fn to create bounding_box dictionaries which are then appended to the wandb.Image. 
def execute(PATH_IMAGE, PATH_TEXT, NAME):
    NAME_LIST = []
    for x in os.listdir(PATH_TEXT):
        if x.endswith(".txt"):
            NAME_LIST.append(x[:-4])
    tabular_data = []
    count = 0
    for x in NAME_LIST:
        box_path = PATH_TEXT + str(x) + ".txt"
        image_path = PATH_IMAGE + str(x) + ".jpg"
        class_num, final_dict = bounding_box_fn(box_path)
        tabular_data.append([count, wandb.Image(image_path,
        boxes = final_dict), class_labels[class_num]])
        count += 1
    columns = ['index', 'image', 'label']
    test_table = wandb.Table(data = tabular_data, columns = columns)
    run.log({NAME : test_table})
3.5 Running the CodeFinally, we can execute the code block.
execute(PATH_TRAIN_IMAGES, PATH_TRAIN_LABELS, "Test")
execute(PATH_VAL_IMAGES, PATH_VAL_LABELS, "Validation")
This workaround will become obsolete with future releases of wandb. You can replicate the whole process with a single argument during training, such as save_dataset. 
As of writing, the changes have not been shipped yet, so this is the only alternative. 
﻿
Run set1
﻿
Step 4: Training YOLOv8Now, we can finally get to the juicy part of our project. We are going to train YOLOv8 on our custom data. We have created the ideal folder structure for YOLOv8 training in the annotation step. We will use that data to train.
Navigate to the 4_Custom_train_YOLOv8 subdirectory. 
We are going to, as usual, run the train_yolov8n.py file. Then we will go through the code and understand it. 
Run the following command:
python3 train_yolov8n.py
We can now, break the code and understand it further. 
from ultralytics import YOLO
from wandb.integration.yolov8 import add_callbacks as add_wandb_callbacks
﻿
model = YOLO("yolov8n.yaml")
add_wandb_callbacks(model, project = "wildlife-yolov8")
﻿
results = model.train(data = "../wildlife_dataset.yaml",epochs = 1000, device = 0, save_period = 10)
We import YOLO from ultralytics and import add_wandb_callbacks from wandb. 
We initialize the yolov8n model and train it for 1000 epochs. Typically, the training won’t last for 1000 epochs. By default, the patience parameter is set to 50. If the model has not considerably improved in the last 50 epochs, we can perform EarlyStopping. If you want to run it for the complete 1000 epochs, you can set the Patience to an absurd number, such as 1000. 
The device argument is in place to decide whether we will be using GPU or CPU for training. Set device = 0 for GPU training. We will also be setting the save_period to 10. This will only save data every 10 epochs. 
We will use the same code but run it using a different model size - yolov8m. I am performing this to compare metrics. 
Run the following from the terminal.
python3 train_yolov8m.py
﻿
Run set1
﻿
After the training is over, we can save the weights and use them to compare object detections. 
Run the following command from the terminal
python3 object_detection.py
Here are the results:
Results from yolov8n
﻿
﻿
You can stop here if all you needed was a Custom Object Detection Model. We are going to take it a step further to create a Custom Object Tracker. It is far easier than you imagine it to be, so hang on for the final part of our project.
Step 5: Tracking WildlifeIn this step, all we need to do is to integrate BoTSORT and ByteTrack with YOLOv8 to create custom Wildlife Trackers. 
The latest version of Ultralytics provides inbuilt native tracking. It provides support for two trackers (as of now):
BoT-SORT: A really good Multi-Object Tracker
ByteTrack: A simple and fast Multi-Object Tracker
BoT-SORT performs way better tracking than ByteTrack, but ByteTrack is lean and fast. 
We will be using both these trackers and track some animals crossing a road. We will also be using an amazing library called `Supervision` for Line Annotation and Line Crossing Detection. All of these tools make it easier for us to build and test out a Custom Tracker. 
Navigate to 5_Custom_Track_YOLOv8
Run the BoTSORT_track.py. 
As always, we will break down the code and look into it further. Here is the code:
5.1 Importing LibrariesWe will be importing YOLO from ultralytics, VideoSink, and VideoInfo from supervision. We will also import `supervision` itself. 
from ultralytics import YOLO
import supervision as sv
from supervision.video  import VideoSink, VideoInfo
5.2 Setting Up Configuration and LineWe are going to set up the SOURCE_VIDEO_PATH and TARGET_VIDEO_PATH. We are also going to draw a line on the frame, which acts as a boundary in our application. If the animal crosses the line, we will count it as a threat. 
We are constructing a vertical line. In our case, the vertical line will split the video in half. Therefore the starting point will have half the width and no height. Similarly, the ending point will have half width and full height. 
START = sv.Point(904,0) # (half of the width, no height)
END = sv.Point(904,1356) # (half of the width, full height)
﻿
SOURCE_VIDEO_PATH = "/notebooks/Elephant.mov"
TARGET_VIDEO_PATH = "/notebooks/Elephant-BoTSORT.mp4"
5.3 Loading Video and ModelWe will be using supervision.VideoInfo to create a variable called video_info which will store metadata about Source Video such as width, height etc. 
We will load our Custom Model with the YOLO function. 
video_info = VideoInfo.from_video_path(SOURCE_VIDEO_PATH)
﻿
model = YOLO("../4_Custom_Train_YOLOv8/runs/detect/train2/weights/best.pt")
5.4 Constructing Lines and AnnotatorsWe can now use our START and END variables to construct a LineZone. LineZone objects will become useful later when we need to determine if our detection is inside the region or not. We will also set thickness, text_thickness and text_scale with LineZoneAnnotator and BoxAnnotator. 
line_zone = sv.LineZone(start = START, end = END)
line_zone_annotator = sv.LineZoneAnnotator(
    thickness = 2,
    text_thickness = 1,
    text_scale = 0.5
)
box_annotator = sv.BoxAnnotator(
    thickness = 2,
    text_thickness = 1,
    text_scale = 0.5
)
5.5 Tracking and Saving the VideoWe will be using VideoSink to save our tracked video. We save it frame by frame. In each frame, we perform tracking and save the result in a supervision Detections object. From the Detections object, we will split it into tracker_id, class_id, and confidence. We will save all three variables in a labels object. Then we will use linezone.trigger to check if the object is inside the region or not. We will annotate and store this frame. 
with VideoSink(TARGET_VIDEO_PATH, video_info) as sink:
    for result in model.track(source = SOURCE_VIDEO_PATH, project = "/notebooks/", stream = True,agnostic_nms = True, tracker = "botsort.yaml"):
        frame = result.orig_img
        detections = sv.Detections.from_yolov8(result)
        if result.boxes.id is not None:
            detections.tracker_id = result.boxes.id.cpu().numpy().astype(int)
        labels = [
            f'#{tracker_id} {model.model.names[class_id]}{confidence:0.2f}'
            for _, confidence, class_id, tracker_id 
            in detections
        ]
    
        frame = box_annotator.annotate(scene = frame, detections = detections, labels = labels)
        line_zone.trigger(detections=detections)
        line_zone_annotator.annotate(frame=frame, line_counter=line_zone)
        sink.write_frame(frame)
I am going to perform the same experiment with ByteTrack. All we need to do is change the tracker from `botsort.yaml` to bytetrack.yaml
Run the following command from the terminal.
python3 ByteTrack_track.py
BoTSORT:
﻿
﻿
ByteTrack:
﻿
Serene MalgudiBy successfully building his wildlife tracker, Laxman gets local government assistance to install multiple CCTVs around Malgudi. He runs his software on the incoming video feed and creates an alert mechanism for his town. 
Everyone in the town starts relying on Laxman's application to plan vegetation, celebration, and much more. Now, he is widely regarded as the Hero of Malgudi. He plans on traveling throughout India to install his software and help rural folks. 
ConclusionIn this article, we performed many experiments and successfully built a wildlife tracker. We learned everything from the basics of YOLOv8, setting up your machine for YOLOv8, and finally, creating a custom object tracker with YOLOv8. 
Thank you for sticking with the piece, and if you have any questions, feel free to reach out!
Credits and RecommendationsElephant Grasslands - Video by P'MA'﻿
Lion - Photo by Gary Whyte﻿
Elephants Crossing - Video by Rihan Bezuidenhout from Pexels﻿
Bird - Photo by Philippe Donn﻿
DogVid - Video by Free Videos﻿
Apples - Photo by Josh Hild﻿
Also, a huge thanks to the following people and resources:
Ivan Goncharov - ModifiedOpenLabeling﻿
Piotr Skalski and RoboFlow - Supervision﻿
﻿
﻿
Add a comment
Tags: Articles, YOLO, Computer Vision, Beginner, Tutorial, Object Detection
Iterate on AI agents and models faster. Try Weights & Biases today.