Meta AI Releases Demo Suite Featuring Self-Supervised Learning Capabilities With DINO
Last year Meta AI released a report on their efforts in self-supervised learning, using a new method they coined DINO. Today they've released a collection of demos illustrating the powerful image recognition capabilities of a model trained with it.
Created on April 8|Last edited on May 25
Comment
Meta AI has been working on a trained model, using their DINO method of self-supervised learning released last year, and today they have released a suite of demos to illustrate the power their method has.
Read Meta AI's post about it here: https://ai.facebook.com/blog/dino-self-supervised-learning-demo/
What is DINO and how does self-supervised learning work?
Last year META AI released DINO, a method of self-supervised learning specialized in training Vision Transformers without supervision. DINO focusses on training models to understand object segmentation within images and video, something computer vision is not great at. Being able to tell the subject or actors of a video from the background is important, and something our human brains do effortlessly. DINO aims to reach towards that human ability.

Self-supervised learning, the avenue of machine learning which the DINO method uses, might sound like an unrealistic dream the first time you hear it. With self-supervised learning, the model learns with unlabeled data, basically learning how to differentiate and categorize things on it's own terms. The DINO algorithm aims to help the model succeed in this learning method.
Read more about DINO on Meta AI's post about it last year here: https://ai.facebook.com/blog/dino-paws-computer-vision-with-self-supervised-transformers-and-10x-more-efficient-training/
What do these DINO demos really show us?
The demos released by Meta AI show us what a model trained in the DINO method of self-supervised learning is really capable of. The trained model on display has it's focus on a variety of image recognition and classification utilities, particularily with comparing images and finding similar qualities in other images.
The demos showcases a few things the model can do, including similar image retrieval, image segmentation, and most interestingly the ability to identify and relate specific patches of one image to similar patches within many other images; ie. the ability to select the nose of a cat in one image, and recieve an avalanche of pictures featuring cats with adorable noses.

Find out more
Recommended Reading
DINO: Emerging Properties in Self-Supervised Vision Transformers
Breakdown of Emerging Properties in Self-Supervised Vision Transformers by Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski and Armand Joulin with Weights and Biases logging ⭐️.
Self Supervised Learning in Audio and Speech
Learning speech representations from raw audio in an unsupervised training fashion
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.