Bounding Boxes for Object Detection

How to log and explore bounding boxes. Made by Stacey Svetlichnaya using Weights & Biases
Stacey Svetlichnaya

Introduction

If you're training models for object detection, you can interactively visualize bounding boxes in Weights & Biases. This short demo focuses on driving scenes, testing a YoloV3 net pretrained on MSCOCO on images from the Berkeley Deep Drive 100K dataset. The API for logging bounding boxes is flexible and intuitive. Below, I explain the interaction controls for this tool and a few ways you might use it to analyze your models.

This approach can help in object detection on many other kinds of images, from microscope slides to x-rays to satellite and beyond. You can read more about understanding driving scenes in this report and more about the Lyft's self-driving car dataset in this report

High-level view: Many examples on validation data

High-level view: Many examples on validation data

Zooming in: Different classes in a specific model

Zooming in: Different classes in a specific model

Controls

If you click on the Settings icon in the top left corner of a media panel, you will see this pop-up for interacting with the images: Screen Shot 2020-04-27 at 10.02.27 AM.png

Code

You can find the full API documentation here. It enables flexible logging in many configurations:

Here is the logging code I use in this report, given three lists returned by the pretrained YoloV3 model, the filename of the input validation image, and its width & height. My pretrained YoloV3 code returns three lists:

# this is the order in which my classes will be displayed
display_ids = {"car" : 0, "truck" : 1, "person" : 2, "traffic light" : 3, "stop sign" : 4,
               "bus" : 5, "bicycle": 6, "motorbike" : 7, "parking meter" : 8, "bench": 9,
               "fire hydrant" : 10, "aeroplane" : 11, "boat" : 12, "train": 13}
# this is a revese map of the integer class id to the string class label
class_id_to_label = { int(v) : k for k, v in display_ids.items()}

def bounding_boxes(filename, v_boxes, v_labels, v_scores, log_width, log_height):
    # load raw input photo
    raw_image = load_img(filename, target_size=(log_height, log_width))
    all_boxes = []
    # plot each bounding box for this image
    for b_i, box in enumerate(v_boxes):
        # get coordinates and labels
        box_data = {"position" : {
          "minX" : box.xmin,
          "maxX" : box.xmax,
          "minY" : box.ymin,
          "maxY" : box.ymax},
          "class_id" : display_ids[v_labels[b_i]],
          # optionally caption each box with its class and score
          "box_caption" : "%s (%.3f)" % (v_labels[b_i], v_scores[b_i]),
          "domain" : "pixel",
          "scores" : { "score" : v_scores[b_i] }}
        all_boxes.append(box_data)

    # log to wandb: raw image, predictions, and dictionary of class labels for each class id
    box_image = wandb.Image(raw_image, boxes = {"predictions": {"box_data": all_boxes, "class_labels" : class_id_to_label}})
    return box_image