Collect and Label Images to Train a YOLOv5 Object Detection Model in PyTorch

This tutorial will guide you on how to prepare datasets to train custom YOLOv5 model step by step. Made by Dave Davies using Weights & Biases
Dave Davies
Welcome to Part 2 of our YOLOv5 tutorial series! If you haven't checked out part 1 of this series, I would recommend you read that first; it covers how to install YOLOv5 for real object detection on Windows and Google Colab, which we'll be assuming you've done in this report.
That said? Once you have set up the environment, you can dive into preparing the dataset.

Sections (Click to Expand)

Getting Started

We see daily machines and robots that learn on their own, and continue to improve without the need of human intervention or being explicitly programmed.
Have you ever wondered:
✓ How self-driving cars can see and drive safely?
✓ How Facebook can recognize you in images and tag you automatically?
✓ How algorithms can diagnose medical images for diseases?
✓ How crops are analyzed and classified into different categories based on ripeness level?
The core of all these amazing technologies lies in computer vision. The field of computer vision is developing rapidly and in some cases has even surpassed humans in solving many visual task.
While computer vision as a field covers many tasks, here we're going to focus on three of the more popular.
  1. image classification
  2. object detection
  3. image segmentation
Difference between classification, localization, detection and segmentation.
As in the previous tutorial, today we'll be focused primarily on object detection.

What Is Object Detection?

Object detection combines classification as well as localization to determine where an object is present in an image or video.
It's a type of supervised machine learning model, which means we need to provide our algorithm with a trained dataset that contains images along with their respective labels. This may seem simple, but deep learning models generally require large amounts of data to train them. In other words: a model needs a lot of examples before it can tell what's in an unlabeled image.
But as with people, it's important that what we feed the model is quality as much as it is quantity. The higher the quality of data, the better the results.
Graph indicates how increasing amount of data increase the performance.

Collecting Training Images

In this tutorial our major focus will be to look at different approaches for collecting data to train our custom model.
Below are some of the possible options:

Public Datasets

Here are just a few places where you can easily access datasets free of cost.
  1. Google Dataset Search: Google launched this search engine to enable researchers to access datasets easily and quickly. It contains more than 25 million datasets.
  2. Kaggle Datasets: Kaggle helps the data science community access machine learning datasets of all kinds. It is easily one of best resources for this task.
  3. UCI Machine Learning Repository: Created in 1987, the UCI page is of the oldest free dataset repositories in the world.
  4. Visual Data: As the name implies, this search engine contains datasets specifically for computer vision. It is a great source when you are looking for datasets related to classification, image segmentation and image processing.
  5. Papers With Code: A community for free and open-source research projects that contains code as well as datasets.

Data Augmentation

If we have limited images we can take advantage of data augmentation techniques.
This process involves augmenting an existing dataset with newly acquired external data. For example cropping, flipping, rotating, changing contrast and brightness etc.
Using this technique we can enhance the size of existing small training data, as they function like new images from the perspective of most object detection models.

Free Online Images

You may have heard but the internet has a lot of images on it. Resourceful folks can write simple scrapers to collect images for their desired datasets.

Taking Photographs

Another possible option? Pick up your camera and capture images. It's hard to scale this out but you can capture the exact images you need for a given task.

Labeling Data Using Labeling Tools

There's a limited amount of annotated data currently available for object detection. That can be a hurdle.
If you're not able to find an annotated dataset from the above mentioned resources then you need to create one by yourself.
There are many tools available on GitHub that you can use to annotate the images free.
A few of the more popular are:
And some paid tools:
I am going to use the Open-source repo OpenLabeling powered by OpenCV. But before we start to label data we need to understand that there are different bounding box formats.
There are:
Each format uses its specific representation of bounding box coordinates. YOLOv5 and other YOLO networks use two files with the same name, but the extension of files is different.
One file is the jpeg image file and the other is .txt text file where information about the labels within the image is stored.
The number of rows indicates the number of objects present in an image. Each row has five parameters
The coordinates and bounding box dimensions are normalized between zero and one as percentage of image dimensions
I will be using OpenLabeling tool to label the images.
Let’s get started.

Get the Labelling Repo From GitHub

First you need to download and unzip the labeling repo from GitHub.
Here is the link
Once the download is complete, extract the folder.
You'll need to put the images to be labeled in images folder.
As noted previously, the labels for each image in the YOLO format will be created with the same name but with a .txt extension in bbox_txt folder.
Make sure that you have installed the required libraries in requirements.txt file to run the modified Open Labeling tool.
To do this simply open the folder location, enter cmd in the address bar and type:
pip install -r requirements.txt
In order to launch the tool execute the file enter:
Which should produce:
The sliding window bar at the top is used to switch the images.
You can use these shortcut keys to navigate
The number of classes is specified in the class.txt file. In our case we have got 4 classes closed door, opened door, bus and number.
All the bounding box that you draw on images will be automatically added to .txt file

Split The Data Into Training And Validation:

Once the labeling is done, we will split our data into training and validation sets.
The ratio of split is noted in the file and you can change it according to your specific needs.
Execute the by typing into the command prompt:
This will create a custom_dataset directory that will split data into train and val folders.
And appear as:
Copy the custom dataset folder into Yolov5 folder (created in tutorial 1).

Creating The YAML File For Our Data:

Before we create the YAML file, lets start by answering a question you may have:

What Is A YAML File Anyway?

A YAML file is a document written in YAML (YAML Ain't A Marketing Language). They're generally used for configuration files.

Creating Your YAML File

Go to the data folder in yolov5-master and create a new file names custom_data.yaml
Finally edit the custom_data.yaml.
Update the train and val paths:
train: custom_dataset/train/ val: custom_dataset/val/ # number of classesnc: 4# class namesnames: ['closed_door', 'opened_door', 'bus', 'number']
Which should look like:
And don't forget to save.

Using W&B Artifacts To Upload The Dataset

If you haven't yet done so, you'll need to take a couple minutes to create a free W&B account.
Once you've done that you'll quickly create a W&B project:
Then you'll simply install W&B into YOLO with:
pip install wandb
Next we login to W&B using this command:
wandb login
Note: If you get the error: 'wandb' is not recognized as an internal or external command, operable program or batch file. you likely haven't added the Scripts directory to your Python installation path. We discussed how to do this in part one here. As a workaround, you can simply enter: python -m wandb login
And now it's time to log our dataset. We do this by entering:
python utils/loggers/wandb/ --project YOLO --data data/custom_dataset.yaml
Making sure to change the project name to yours.
And voila ...
Now you can find them safe-and-sound in your W&B Artifacts.
One way to easily find them all is in your val.table.
A handy bonus for those of you wanting to share with your colleagues is that this table and many other features can be embedded interactively into reports.
Some people learn better through posts, and some through video. If you had any trouble following along, or have any questions please let us know in the comment below, but this great video by W&B's Ivan Goncharov.

A Video Walkthrough