Collect and Label Images to Train a YOLOv8 Object Detection Model

This tutorial will guide you on how to prepare datasets to train custom YOLOv8 model step by step.
Created on December 25|Last edited on January 2
Comment
﻿
﻿source﻿
Object detection models like YOLOv8 (You Only Look Once version 8) have revolutionized computer vision applications by enabling accurate real-time object detection in images and videos. However, the effectiveness of these models heavily depends on the quality and quantity of the training data used. Collecting and labeling images plays a crucial role in training a robust YOLOv8 model.
In this guide, we'll walk through the process of collecting images and labeling them for YOLOv8 model training.
YOLOv8 has been primarily used in Image Classification, Object Detection, and Image Segmentation. But out of these the most popular use case is the Object Detection, so for this particular tutorial, we will focus on this task.
﻿source﻿
Here's what we'll be covering:
Gathering ImagesStep One: Identify Object ClassesStep Two: Data CollectionStep Three: Quality ControlImage LabelingStep Four: Choosing Labeling ToolsStep Five: AnnotationStep Six: Quality AssuranceData PreprocessingStep Seven: Data AugmentationStep Eight: Splitting The DataCreating the YAML file for trainingUsing W&B Artifacts To Upload The DatasetInstalling YOLOv8Closing Up
﻿
﻿
Let's get going.
Gathering ImagesYou can't label images you don't have, so the first step is collecting them. Doing this involves:
Step One: Identify Object ClassesDefine the classes of objects you want the model to detect. For instance, if you're building a model to detect cars, pedestrians, and bicycles, identify these as your target classes.
Step Two: Data CollectionCollect a diverse set of images containing the objects you want to detect. This can be done through various means:
Use public datasets: Numerous publicly available datasets like COCO, VOC, and Open Images contain labeled images for object detection.
﻿Google Dataset Search: Google launched this search engine to enable researchers to access datasets easily and quickly. It contains more than 25 million datasets.
﻿Kaggle Datasets: Kaggle helps the data science community access machine learning datasets. It is easily one of the best resources for this task.
﻿UCI Machine Learning Repository: Created in 1987, the UCI page is one of the oldest free dataset repositories in the world.
﻿Visual Data: As the name implies, this search engine contains datasets specifically for computer vision. It is a great source when looking for datasets related to classification, image segmentation, and image processing.
﻿Papers With Code: A community for free and open-source research projects containing code and datasets.
Scraping the web: Utilize web scraping tools or APIs to gather images from search engines, social media platforms, or specific websites relevant to your application.
Step Three: Quality ControlEnsure the collected images are diverse and contain different backgrounds, lighting conditions, and variations in object poses and sizes. Aim for a balanced dataset with sufficient images for each class.
Image LabelingOf course, after you've collected your images, you need to label them.
Step Four: Choosing Labeling ToolsSelect a suitable annotation tool to label objects in your images. Tools like LabelImg, VGG Image Annotator (VIA), or CVAT (Computer Vision Annotation Tool) are commonly used.
CVAT is an interactive video and image annotation tool for computer vision. It is used by tens of thousands of users and companies worldwide.
﻿cvat.ai is an online version of CVAT. It's free, efficient, easy to use, and runs the latest version of the tool. You can create up to 10 tasks and upload up to 500Mb of data to annotate. It will only be visible to you or the people you assign to it.
Prebuilt docker images are the easiest way to start using CVAT locally. They are available on Docker Hub:
﻿cvat/server﻿
﻿cvat/ui﻿
The images have been downloaded more than 1M times so far.
﻿source﻿
Step Five: AnnotationOpen the chosen tool and load the images. Follow these steps:
Define bounding boxes around objects: Draw rectangles (bounding boxes) around objects belonging to the specified classes for each image.
Assign class labels: Tag each bounding box with the corresponding class label (e.g., car, pedestrian, bicycle).
Consistency and Accuracy
Ensure consistency in labeling across images to maintain accuracy. Follow labeling guidelines strictly and maintain uniformity in labeling style and size of bounding boxes.
💡
Step Six: Quality AssuranceRegularly review annotated images to correct any errors or inconsistencies. Quality assurance is crucial for the reliability of the model.
Data PreprocessingTime for some preprocessing, to help increase the variety and quality of the images.
Step Seven: Data AugmentationAugment the dataset by applying transformations like rotation, flipping, scaling, and adjusting brightness. Data augmentation helps in enhancing model generalization and robustness.
Step Eight: Splitting The DataDivide the labeled dataset into training, validation, and testing sets. An 80-10-10 split is typically used for training, validation, and testing, respectively.
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
﻿
from tensorflow.keras import layers
﻿
(train_ds, val_ds, test_ds), metadata = tfds.load(
    'tf_flowers',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True,
)
﻿
data_augmentation = tf.keras.Sequential([
  layers.RandomFlip("horizontal_and_vertical"),
  layers.RandomRotation(0.2),
])
﻿
plt.figure(figsize=(10, 10))
for i in range(9):
  augmented_image = data_augmentation(image)
  ax = plt.subplot(3, 3, i + 1)
  plt.imshow(augmented_image[0])
  plt.axis("off")
﻿
﻿source﻿
Creating the YAML file for trainingThe YAML file is a config file that will be used as the input config to the YOLOv8 training. The final config file should look something like this:
dataset.yaml
train: custom_dataset/train/  
val: custom_dataset/val/ 
﻿
# number of classes
nc: 4
﻿
# class names
names: ['closed_door', 'opened_door', 'bus', 'number']
Using W&B Artifacts To Upload The DatasetCreate a W&B account and install W&B using 
pip install wandb
Then login using
wandb login
To log the dataset, use:
run = wandb.init(project="yolov8", job_type="add-dataset")
artifact = wandb.Artifact(name="my_data", type="dataset")
artifact.add_dir(local_path="/dataset/")  # Add dataset directory to artifact
run.log_artifact(artifact)
﻿
﻿
Installing YOLOv8First, install the YOLOv8 model.
pip install ultralytics
﻿source﻿
YOLOv8 Detect, Segment, and Pose models pre-trained on the COCO dataset are available here, and YOLOv8 Classify models pre-trained on the ImageNet dataset. Track mode is available for all Detect, Segment, and Pose models.
To train the model from scratch, try out the following:
from ultralytics import YOLO
﻿
# Load a model
model = YOLO("yolov8n.yaml")  # build a new model from scratch
model = YOLO("yolov8n.pt")  # load a pretrained model (recommended for training)
﻿
# Use the model
model.train(data="dataset.yaml", epochs=3)  # train the model
metrics = model.val()  # evaluate model performance on the validation set
results = model(test_image)  # predict on an image
path = model.export(format="onnx")  # export the model to ONNX format
If you want to add the W&B model training logging, you can use the following functionality to add the metric logging to your training code conveniently:
from wandb.integration.ultralytics import add_wandb_callback
﻿
# Add W&B callback for Ultralytics
add_wandb_callback(model, enable_model_checkpointing=True)
﻿
# Train/fine-tune your model
# At the end of each epoch, predictions on validation batches are logged
# to a W&B table with insightful and interactive overlays for
# computer vision tasks
model.train(project="yolov8", data='custom_dataset', epochs=5, imgsz=640)
model.val()
﻿
# Finish the W&B run
wandb.finish()
Closing UpTraining a custom YOLOv8 object detection model requires a meticulous process of collecting, labeling, and preprocessing images. In this tutorial we've walked through each step, from identifying object classes and gathering diverse image datasets, to labeling images with precision and augmenting data for robust model training.
The use of advanced tools like CVAT for labeling and TensorFlow for data augmentation, along with the integration of W&B for dataset management and model training, simplifies and streamlines the process. The culmination of these efforts is the creation of a well-prepared dataset that can be used to train a YOLOv8 model efficiently.
﻿
Add a comment
Tags: Articles, YOLO, Computer Vision
Iterate on AI agents and models faster. Try Weights & Biases today.