Collect and Label Images to Train a YOLOv8 Object Detection Model
This tutorial will guide you on how to prepare datasets to train custom YOLOv8 model step by step.
Created on December 25|Last edited on January 2
Comment

Object detection models like YOLOv8 (You Only Look Once version 8) have revolutionized computer vision applications by enabling accurate real-time object detection in images and videos. However, the effectiveness of these models heavily depends on the quality and quantity of the training data used. Collecting and labeling images plays a crucial role in training a robust YOLOv8 model.
In this guide, we'll walk through the process of collecting images and labeling them for YOLOv8 model training.
YOLOv8 has been primarily used in Image Classification, Object Detection, and Image Segmentation. But out of these the most popular use case is the Object Detection, so for this particular tutorial, we will focus on this task.

Here's what we'll be covering:
Gathering ImagesStep One: Identify Object ClassesStep Two: Data CollectionStep Three: Quality ControlImage LabelingStep Four: Choosing Labeling ToolsStep Five: AnnotationStep Six: Quality AssuranceData PreprocessingStep Seven: Data AugmentationStep Eight: Splitting The DataCreating the YAML file for trainingUsing W&B Artifacts To Upload The DatasetInstalling YOLOv8Closing Up
Let's get going.
Gathering Images
You can't label images you don't have, so the first step is collecting them. Doing this involves:
Step One: Identify Object Classes
Define the classes of objects you want the model to detect. For instance, if you're building a model to detect cars, pedestrians, and bicycles, identify these as your target classes.
Step Two: Data Collection
Collect a diverse set of images containing the objects you want to detect. This can be done through various means:
Use public datasets: Numerous publicly available datasets like COCO, VOC, and Open Images contain labeled images for object detection.
- Google Dataset Search: Google launched this search engine to enable researchers to access datasets easily and quickly. It contains more than 25 million datasets.
- Kaggle Datasets: Kaggle helps the data science community access machine learning datasets. It is easily one of the best resources for this task.
- UCI Machine Learning Repository: Created in 1987, the UCI page is one of the oldest free dataset repositories in the world.
- Visual Data: As the name implies, this search engine contains datasets specifically for computer vision. It is a great source when looking for datasets related to classification, image segmentation, and image processing.
- Papers With Code: A community for free and open-source research projects containing code and datasets.
Scraping the web: Utilize web scraping tools or APIs to gather images from search engines, social media platforms, or specific websites relevant to your application.
Step Three: Quality Control
Ensure the collected images are diverse and contain different backgrounds, lighting conditions, and variations in object poses and sizes. Aim for a balanced dataset with sufficient images for each class.
Image Labeling
Of course, after you've collected your images, you need to label them.
Step Four: Choosing Labeling Tools
Select a suitable annotation tool to label objects in your images. Tools like LabelImg, VGG Image Annotator (VIA), or CVAT (Computer Vision Annotation Tool) are commonly used.
CVAT is an interactive video and image annotation tool for computer vision. It is used by tens of thousands of users and companies worldwide.
cvat.ai is an online version of CVAT. It's free, efficient, easy to use, and runs the latest version of the tool. You can create up to 10 tasks and upload up to 500Mb of data to annotate. It will only be visible to you or the people you assign to it.
Prebuilt docker images are the easiest way to start using CVAT locally. They are available on Docker Hub:
- cvat/ui
The images have been downloaded more than 1M times so far.

Step Five: Annotation
Open the chosen tool and load the images. Follow these steps:
- Define bounding boxes around objects: Draw rectangles (bounding boxes) around objects belonging to the specified classes for each image.
- Assign class labels: Tag each bounding box with the corresponding class label (e.g., car, pedestrian, bicycle).
Consistency and Accuracy
Ensure consistency in labeling across images to maintain accuracy. Follow labeling guidelines strictly and maintain uniformity in labeling style and size of bounding boxes.
💡
Step Six: Quality Assurance
Regularly review annotated images to correct any errors or inconsistencies. Quality assurance is crucial for the reliability of the model.
Data Preprocessing
Time for some preprocessing, to help increase the variety and quality of the images.
Step Seven: Data Augmentation
Augment the dataset by applying transformations like rotation, flipping, scaling, and adjusting brightness. Data augmentation helps in enhancing model generalization and robustness.
Step Eight: Splitting The Data
Divide the labeled dataset into training, validation, and testing sets. An 80-10-10 split is typically used for training, validation, and testing, respectively.
import matplotlib.pyplot as pltimport numpy as npimport tensorflow as tfimport tensorflow_datasets as tfdsfrom tensorflow.keras import layers(train_ds, val_ds, test_ds), metadata = tfds.load('tf_flowers',split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],with_info=True,as_supervised=True,)data_augmentation = tf.keras.Sequential([layers.RandomFlip("horizontal_and_vertical"),layers.RandomRotation(0.2),])plt.figure(figsize=(10, 10))for i in range(9):augmented_image = data_augmentation(image)ax = plt.subplot(3, 3, i + 1)plt.imshow(augmented_image[0])plt.axis("off")

Creating the YAML file for training
The YAML file is a config file that will be used as the input config to the YOLOv8 training. The final config file should look something like this:
dataset.yaml
train: custom_dataset/train/val: custom_dataset/val/# number of classesnc: 4# class namesnames: ['closed_door', 'opened_door', 'bus', 'number']
Using W&B Artifacts To Upload The Dataset
Create a W&B account and install W&B using
pip install wandb
Then login using
wandb login
To log the dataset, use:
run = wandb.init(project="yolov8", job_type="add-dataset")artifact = wandb.Artifact(name="my_data", type="dataset")artifact.add_dir(local_path="/dataset/") # Add dataset directory to artifactrun.log_artifact(artifact)

Installing YOLOv8
First, install the YOLOv8 model.
pip install ultralytics

YOLOv8 Detect, Segment, and Pose models pre-trained on the COCO dataset are available here, and YOLOv8 Classify models pre-trained on the ImageNet dataset. Track mode is available for all Detect, Segment, and Pose models.
To train the model from scratch, try out the following:
from ultralytics import YOLO# Load a modelmodel = YOLO("yolov8n.yaml") # build a new model from scratchmodel = YOLO("yolov8n.pt") # load a pretrained model (recommended for training)# Use the modelmodel.train(data="dataset.yaml", epochs=3) # train the modelmetrics = model.val() # evaluate model performance on the validation setresults = model(test_image) # predict on an imagepath = model.export(format="onnx") # export the model to ONNX format
If you want to add the W&B model training logging, you can use the following functionality to add the metric logging to your training code conveniently:
from wandb.integration.ultralytics import add_wandb_callback# Add W&B callback for Ultralyticsadd_wandb_callback(model, enable_model_checkpointing=True)# Train/fine-tune your model# At the end of each epoch, predictions on validation batches are logged# to a W&B table with insightful and interactive overlays for# computer vision tasksmodel.train(project="yolov8", data='custom_dataset', epochs=5, imgsz=640)model.val()# Finish the W&B runwandb.finish()
Closing Up
Training a custom YOLOv8 object detection model requires a meticulous process of collecting, labeling, and preprocessing images. In this tutorial we've walked through each step, from identifying object classes and gathering diverse image datasets, to labeling images with precision and augmenting data for robust model training.
The use of advanced tools like CVAT for labeling and TensorFlow for data augmentation, along with the integration of W&B for dataset management and model training, simplifies and streamlines the process. The culmination of these efforts is the creation of a well-prepared dataset that can be used to train a YOLOv8 model efficiently.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.