Skip to main content

Using YOLOv5 with W&B to Detect Starfish in the Great Barrier Reef

How I used YOLOv5 and Weights & Biases to detect crown-of-thorns starfish (COTS) in underwater images
Created on July 21|Last edited on August 2
What we'll be looking for today: the crown-of-thorns-starfish

🧩 Introduction

Australia's stunningly beautiful Great Barrier Reef is the world’s largest coral reef. It's home to 1,500 species of fish, 400 species of corals, 130 species of sharks, rays, and a massive variety of other sea life.
Unfortunately, the reef is under threat, in part because of the overpopulation of one particular starfish – the coral-eating crown-of-thorns starfish (or COTS for short). Scientists, tourism operators, and reef managers established a large-scale intervention program to control COTS outbreaks to ecologically sustainable levels.
The goal of this report? To build an object detection model–specifically with YOLOv5 and W&B–to identify these starfish.
NOTE: This competition was hosted by Kaggle in 2022. Feel free to check it out here!ο»Ώ
πŸ’‘
ο»Ώ

πŸ“‹ Table of Contents

ο»Ώ

πŸ““ Code and notebooks

Just a note: these links will send you off to Kaggle.

πŸ›  Install libraries

First, we need to install W&B.
!pip install -qU wandb
And we're done. That's always satisfying.

πŸ“Œ One key point before we jump in

The Kaggle competition metric F2 tolerates some false positives(FP) in order to ensure very few starfish are missed. Which means tackling false negatives(FN) is more important than false positives(FP).
F2=5β‹…precisionβ‹…recall4β‹…precision+recallF2 = 5 \cdot \frac{precision \cdot recall}{4\cdot precision + recall}
ο»Ώ

πŸͺ„ Initializing W&B

ο»Ώ
Let's initialize Weights & Biases for this experiment. If this is your first time using W&B or you are not logged in, the link that appears after running wandb.login() will take you to sign-up/login page. Signing up for a free account is as easy as a few clicks.
import wandb
wandb.login()

πŸ“ Metadata

Here's what we're working with:
  • train_images/: The folder containing training set photos of the form video_{video_id}/{video_frame}.jpg
  • video_id: The ID number of the video the image was part of. (These video IDs are not meaningfully ordered.)
  • video_frame: The frame number of the image within the video. Note: you should expect to see occasional gaps in the frame number from when the diver surfaced.
  • sequence: The ID of a gap-free subset of a given video. The sequence ids are not meaningfully ordered.
  • sequence_frame: The frame number within a given sequence.
  • image_id: The ID code for the image, in the format {video_id}-{video_frame
  • annotations: The bounding boxes of any starfish detection in a string format that can be evaluated directly with Python. Does not use the same format as the predictions you will submit. Not available in test.csv. A bounding box is described by the pixel coordinate (x_min, y_min) of its lower left corner within the image together with its width and height in pixels --> (COCO format).
FOLD = 1 # which fold to train
DIM = 3000
MODEL = 'yolov5s'
BATCH = 4
EPOCHS = 15
OPTMIZER = 'Adam'
ο»Ώ
PROJECT = 'great-barrier-reef' # w&b in yolov5
NAME = f'{MODEL}-dim{DIM}-fold{FOLD}' # w&b for yolov5
ο»Ώ
REMOVE_NOBBOX = True # remove images with no bbox
ROOT_DIR = '/kaggle/input/tensorflow-great-barrier-reef/'
IMAGE_DIR = '/kaggle/images' # directory to save images
LABEL_DIR = '/kaggle/labels' # directory to save labels

Extracting our metadata

We'll use this code to extract our metadata:
# Train Data
df = pd.read_csv(f'{ROOT_DIR}/train.csv')
df['old_image_path'] = f'{ROOT_DIR}/train_images/video_'+df.video_id.astype(str)+'/'+df.video_frame.astype(str)+'.jpg'
df['image_path'] = f'{IMAGE_DIR}/'+df.image_id+'.jpg'
df['label_path'] = f'{LABEL_DIR}/'+df.image_id+'.txt'
df['annotations'] = df['annotations'].progress_apply(eval)
ο»Ώ
df['num_bbox'] = df['annotations'].progress_apply(lambda x: len(x))
ο»Ώ
df['width'] = 1280
df['height'] = 720

🍚 Preparing our dataset

You can refer to the training notebook for detailed explanation on how to prepare dataset for YOLOv5 but in brief we need to do following:

Creating labels for YOLOv5

We need to export our labels to YOLO format, with one `*.txt` file per image (if no objects in image, no `*.txt` file is required). The *.txt file specifications are:
  • One row per object
  • Each row is class `[x_center, y_center, width, height]` format.
  • Box coordinates must be in normalized `xywh` format (from `0 - 1`). If your boxes are in pixels, divide `x_center` and `width` by `image width`, and `y_center` and `height` by `image height
  • Class numbers are zero-indexed (start from `0`).
Bbox format is COCO in this dataset hence `[x_min, y_min, width, height]`. So, we need to convert form COCO to YOLO format.
πŸ’‘

Configuration

The dataset config file requires the following:
  1. The dataset root directory path and relative paths to `train / val / test` image directories (or *.txt files with image paths)
  2. The number of classes nc and
  3. A list of class names:['cots']Modify the yaml file below to update the hyper parameters.
The config file may look something like this:
names:
- cots
nc: 1
path: /kaggle/working
train: /kaggle/working/train.txt
val: /kaggle/working/val.txt
ο»Ώ

πŸ“¦ YOLOv5 +πŸͺ„ W&B

ο»Ώ
To install YOLOv5 with W&B integration all we need to is simply clone this repo. This s repo is synced with official YOLOv5 repo so you'll get the latest features of YOLOv5 along with support of W&B.
This is a community fork of YOLOv5 so you're welcome to contribute if you want to add any features of W&B in YOLOv5!
πŸ’‘
!git clone https://github.com/awsaf49/yolov5-wandb.git yolov5 # clone
%cd yolov5
%pip install -qr requirements.txt # install
ο»Ώ
import torch
import utils
display = utils.notebook_init() # checks

πŸš… Training

The code we'll use today:
!python train.py --img {DIM}\
--batch {BATCH}\
--epochs {EPOCHS}\
--optimizer {OPTMIZER}\
--data /kaggle/working/gbr.yaml\
--weights {MODEL}.pt\
--project {PROJECT} --name {NAME} --entity ml-colabs\
--exist-ok

πŸ” Results

Class Distribution

ο»Ώ
Run set
3
ο»Ώ

Batch Label/Prediction

ο»Ώ
Run set
3
ο»Ώ

Metrics

From the below metrics, it is quite evident that large image size is a key factor to improve the performance, given that the small model yolov5s performs better with large image size (3000Γ—3000)(3000 \times 3000)ο»Ώ than the medium size model yolov5m with a lower image size (1280Γ—1280)(1280 \times 1280)ο»Ώ.
We can also notice that in the same image size (1280Γ—1280)(1280 \times 1280)ο»Ώ, yolov5m outperforms yolov5s6, which means model size is also a key factor here to improve performance.
To get the best result, we need to increase both image and model size, but it'll surely increase the computational cost.
ο»Ώ
ο»Ώ
Run set
3
ο»Ώ
ο»Ώ
ο»Ώ
Run set
3
ο»Ώ

Loss

ο»Ώ
Run set
3
ο»Ώ

πŸ“Š Summary

  • Accurately identified starfish in real-time using YOLOv5 object detection model trained on underwater videos of coral reefs.
  • Demonstrated how to use YOLOv5 with W&B (Weights & Biases).
  • Analyzed the results and identified two key factors for improving performance: Large Image Size and Large Model Size.
Iterate on AI agents and models faster. Try Weights & Biases today.