Skip to main content

YOLO-NAS: SoTA Foundation Model for Object Detection

YOLO-NAS is a new foundation model for object detection that sets a new standard for state-of-the-art object detection.
Created on May 10|Last edited on August 29

Table of Contents



🎬 Introduction

Object detection is a crucial task in computer vision, enabling machines to recognize and locate objects within images or videos. In recent times, the use of Deep Learning Models like YOLO, SSD, and CenterNet has revolutionized how machines perceive and interpret the world around them.
YOLOv1; as proposed by the paper You Only Look Once: Unified, Real-Time Object Detection in 2016, has been one of the most popular approaches for object detection and has since spawned several versions with increasingly improving performance. The core idea of YOLO-based models is that it formulates object detection as a single regression problem: images into a grid and simultaneously predicted bounding boxes and class probabilities.
YOLO-NAS is a new foundation model for object detection developed by Deci AI and is the latest addition to the YOLO family of models. YOLO-NAS pushes the boundaries of YOLO-based object detection models by not only beating existing similar models in terms of efficiency and accuracy but also ensuring optimized performance for production usage.
In this report, we will take a look at more details regarding this model, learn how to use it, and how to get the most out of our experiments with YOLO-NAS using Weights & Biases.

Performance of YOLO-NAS visualized with Weights & Biases.
1




🤿 A Deep Dive into YOLO-NAS

  • YOLO-NAS uses the QSP and QCI blocks suggested by the paper Make RepVGG Greater Again: A Quantization-aware Approach to combine re-parameterization and 8-bit quantization advantages. The usage of these blocks allows for minimal accuracy loss during post-training quantization.
  • Deci AI used AutoNAC, their own proprietary NAS technology, to determine the optimal sizes and structures of stages, including block type, number of blocks, and number of channels in each stage.
  • YOLO-NAS uses a hybrid quantization method that selectively quantizes certain parts of a model, reducing information loss and balancing latency and accuracy.
    • Standard quantization affects all model layers, often leading to significant accuracy loss.
    • The hybrid method used by Deci AI optimizes quantization to maintain accuracy by only quantizing certain layers while leaving others untouched.
    • The layer selection algorithm from AutoNAC considers each layer’s impact on accuracy and latency, as well as the effects of switching between 8-bit and 16-bit quantization on overall latency.
    • Designed specifically for production use, YOLO-NAS is fully compatible with high-performance inference engines like NVIDIA TensorRT and supports INT8 quantization for unprecedented runtime performance. This allows YOLO-NAS to excel in real-world scenarios, such as autonomous vehicles, robotics, and video analytics applications, where low latency and efficient processing are essential.
  • YOLO-NAS was trained on Objects365, a diverse dataset for object detection consisting of 2 million images across 365 categories with 30 million bounding boxes. YOLO-NAS architecture also incorporates Knowledge Distillation and Distribution Focal Loss to enhance its training process.
  • YOLO-NAS was also trained on the RoboFlow100 dataset or RF100, a collection of 100 datasets from diverse domains, to demonstrate its ability to handle complex object detection tasks.
High-Level Architecture Overview of YOLO-NAS. Source: Figure 2 from https://deci.ai/blog/yolo-nas-object-detection-foundation-model/

In order to learn more about the architecture and training protocols followed by YOLO-NAS, you can refer to the official blog post from Deci AI

In order to know more about Knowledge Distillation, which is a model optimization technique crucial in the development of YOLO-NAS, you can check out the following Weights & Biases report by Sayak Paul...




🧠 Using YOLO-NAS for Prediction

The YOLO-NAS architecture is available under an open-source license. Its’ pre-trained weights are available for research use (non-commercial) on SuperGradients, a PyTorch-based, open-source, computer vision training library developed by Deci AI. In this section, we will explore how we can perform inference on our own images with YOLO-NAS and log the results to Weights & Biases using interactive bounding-box image overlays.

YOLO-NAS Predictions logged to Weights & Biases using Interactive Bounding-box overlay.
2




🔧 Fine-tuning YOLO-NAS on your Dataset

In this section, we will see how we can fine-tune a pre-trained variant of YOLO-NAS on a custom dataset using the Super-Graidents library and performing experiment-tracking, logging, versioning model checkpoints, and visualizing your detection datasets and prediction results during training.

Run set
148