Skip to main content

What's New in Computer Vision?

A hand curated list of recent developments in computer vision that I find interesting.
Created on March 12|Last edited on April 23
Welcome to our weekly Computer Vision Digest at W&B! Our goal is to bring you bite-sized, hand-curated stories, reports, papers, videos, and content about what's happening in the computer vision world. If you have anything you'd like to suggest or things you'd like to see, the comments are always open and we'd love to give our community what they want.
In this week's inaugural digest, we will focus on the new dark matter of intelligence - Self-Supervised Learning. A lot of progress was made in the past few months and finally, self-supervision is showing a LOT of promise. Let's dig in:

🐲 Self-supervised learning: The dark matter of intelligence

This was published on 4th4^{th} March 2021 by Yann LeCun and Ishan Misra on Facebook AI's blog. If you're exploring the self-supervised space, this is a good starting point to know all the substantive progress that's been made in the past few months. A taste:
Supervised learning is a bottleneck for building more intelligent generalist models that can do multiple tasks and acquire new skills without massive amounts of labeled data. Practically speaking, it’s impossible to label everything in the world.
This resonated with us. Training our vision models without expensive ground truth labels is crucial for future progress in deep learning. Self-supervised learning is currently showing the way ahead and is capable of approximating a kind of "common sense" in modern AI systems.
Don't have time for the blog? Yannic "Lighting" Kilcher made a video summary:


You can find the blog post here.

🌟 A Simple Framework for Contrastive Learning of Visual Representations

This paper by Chen et al. presents a simple yet effective framework for training computer vision-based models in self-supervised ways using a contrastive learning strategy.
Contrastive methods are based on the simple idea of constructing pairs of x and y that are not compatible and adjusting the parameters of the model so that the corresponding output energy is large. It works:
Sim-CLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, out performing AlexNet with 100× fewer labels.
How does it work though? First, randomly sampled a mini-batch of unlabeled images is applied with a stochastic data augmentation policy. It is then forward passed to bring similar images together using the so-called Normalized Temperature-Scaled Cross-Entropy Loss (NT-XEnt loss).
Sayak Paul summarized this paper in his Weights and Biases report titled "Towards Self-Supervised Image Understanding with SimCLR" with a minimal implementation of Sim-CLR in TensorFlow.

🙌 Unsupervised Learning of Visual Features by Contrasting Cluster Assignments - SwAV

This paper by Matilde et al. is one of my favorites in this space. SwAV was published in July 2020 and was then SoTA in self-supervised learning for visual recognition. SwAV outperforms Sim-CLR:
A little more detail: SwAV uses a clustering-based approach and introduces an online cluster assignment method as an improvement to allow this algorithm to scale well, unlike its predecessors.
Sayak Paul and Ayush Thakur summarized this paper in a Weights and Biases report titled "Unsupervised Visual Representation Learning with SwAV" with minimal implementation in TensorFlow.

🔥 Self-supervised Pretraining of Visual Features in the Wild - SEER

This paper by Priya et al. is the current state-of-the-art in self-supervised visual representation learning but might not be that easy to replicate.
Based on SwAV, this is a billion-parameter model that’s proven to work efficiently with complex, high-dimensional image data (and also why it's not particularly easy to replicate). Self-supervised methods got great results in a controlled environment, i.e. the highly curated ImageNet dataset.
That said, the promise of self-supervised is to learn rich features from data in the wild without supervision. The authors of SEER explored if self-supervision lives up to its expectation by training large models on random, unlabeled, and uncurated public Instagram images. It does:
The final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1%and confirming that self-supervised learning works in a real world setting.

🎶 Shopee - Price Match Guarantee

Check out this newly released Kaggle competition.
Finding near-duplicates in large datasets is an important problem for many online businesses. In Shopee's case, everyday users can upload their own images and write their own product descriptions, adding an extra layer of challenge. Your task is to identify which products have been posted repeatedly. The differences between related products may be subtle while photos of identical products may be wildly different!
This competition seems ideal to test out self-supervised learning approaches.


Conclusion

That's it for this week, but we'll be back in seven days. If you have anything you'd like to see, any stories you think we missed, or any W&B reports that you're dying to share, the comments are right down there. 👇 Thanks for joining us!