Skip to main content

Google Introduces A New Method For Training Stereo Models

Google uses AI to make training data — potentially solving a classic, long-standing problem that has existed unsolved in the field of computer vision.
Created on April 7|Last edited on April 10
One of the exciting possibilities about the future of AI is the idea of using AI to generate data that could later be utilized by other AI models as training data, resulting in a snowball effect of rapidly improving performance. As new ML methods arise for different tasks, each method tends to have its own set of pros and cons, and methods that predict depth from images are no exception.

The Classic Problem

Obtaining depth from images is a long-standing problem in the field of computer vision, focusing on obtaining depth information from two rectified images. While hand-crafted algorithms were traditionally used, deep learning has transformed the field, with end-to-end networks becoming the dominant solution. However, these networks require a large amount of annotated data, which is difficult to obtain for depth estimation.
With the core problem being that of obtaining labeled training data efficiently, Google researchers have found a new method for generating this data through the use of Neural Radiance Fields. Neural Radiance Fields are able to generate an image of an object from any perspective, given just a few samples of the object from different perspectives.

Old Meets New

3D depth perception has typically been achieved through a few different methods, which include supervised stereo-matching methods and self-supervised stereo prediction. The issue with supervised methods is that they require massive amounts of data, which is expensive to generate.
On the other hand, self-supervised methods struggle with generalization to new domains and perform poorly under occlusions. Overall, the researchers were able to train a NeRF model that is capable of generating image pairs from any desired perspective and then use these new image pairs as training data for a supervised stereo model.
The result of this is essentially infinite training data and, ultimately, state-of-the-art performance.

Future Opportunities

The future of AI holds immense potential for improving performance through a synergistic combination of different AI models and methods.
By leveraging the strengths of supervised models and data synthesis methods, researchers can harness the power of NeRF models to create image pairs from any desired perspective, providing an essentially limitless supply of training data for supervised stereo models.
The paper noted that this method can even further work for other problems besides 3D depth perception, including optical flow estimation and multi-view stereo, so look out for improvement in these areas very soon!

The Paper:

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.