Skip to main content

TAP-Vid: DeepMind's New Dataset & Benchmark For Point Tracking

DeepMind has introduced a new dataset for point tracking video in 3D space.
Created on November 8|Last edited on November 8
Today, DeepMind introduced TAP-Vid, a new dataset featuring videos annotated with tracked points. The dataset, and it's related materials, were created as part of a paper submitted and accepted to NeurIPS 2022. The datasets are freely downloadable, and a GitHub repository is set up to guide those looking to use it.


TAP - tracking any point

The researchers behind TAP-Vid identified a lack of datasets like it - video annotated with tracked points - and wanted to fill that void. Using a point-based approach to spatial tracking in video scenes allows for an understanding of 3D space much better than the standard bounding-box approach; Points can appear to move relative to each other and can be occluded by objects, presenting a clearly 3D space (point-tracking is a common method used for making realistic special effects and 3D CGI).
TAP-Vid is available in a few different subsets: TAP-Vid-Kinetics, TAP-Vid-DAVIS, and TAP-Vid-RGB-Stacking (based on the Kinetics dataset, DAVIS dataset, and RGB-Stacking simulator respectively). For the full benchmark, Kubric is also used (a synthetic dataset with the ability to create an arbitrary number of point tracking annotations).

Find out more

Pages with examples are available for DAVIS, RGB-Stacking, and Kubric datasets.
Direct downloads for the datasets and instructions are available at the GitHub.
Read the full research paper for in-depth info.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.