Paper Reading Group: Vision Transformers

The paper reading groups are supported by experiments, blogs & code implementation! This is your chance to come talk about the paper that interests you!.
Aman Arora

In our next paper reading group, Aman Arora from Weights & Biases is discussing the 1st of 4 papers from our PRG series:

  1. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [paper, blog]

We will continue with:

2. Revisiting ResNets: Improved Training and Scaling Strategies [paper, blog] - May 25
3. EfficientNetV2: Smaller Models and Faster Training [paper, blog] - June 8
4. Characterizing Signal Propagation to Close the Performance Gap in Unnormalized ResNets [paper, blog] - June 22

Register here for the May 09, 12pm PT Reading Group.

This is your chance to ask your burning questions!
Comment below any questions that you'd like to be answered as part of our next Paper Reading Group.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In this paper, Dosovitskiy et al show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. Paper Summary can be found here.

In upcoming sessions of our PRG series on 4 CV papers we will cover:

Revisiting ResNets: Improved Training & Scaling Strategies - May 25

With over 63,000 citations, ResNets have been at the forefront of research in Computer Vision (CV) models even today. Most recent CV papers compare their results to ResNets to showcase improvements either in accuracy or speed or both.
❓: But, do such improvements on ImageNet top-1 accuracy come from model architectures or improved training and scaling strategies?
This is precisely the question that Bello et al try to answer in their recent paper Revisiting ResNets: Improved Training and Scaling Strategies.

EfficientNetV2: Smaller Models and Faster Training - June 8

After the massive success of the EfficientNet architecture, Mingxin Tan and Quoc V. Le have done it again! This time they have come up with a new family of networks that have faster training speed and better parameter efficiency - EfficientNetV2! Would these networks have the same success as EfficientNets? Most probably, yes!

Characterizing Signal Propagation to Close the Performance Gap in Unnormalized ResNets - June 22

Another key advancement recently has come from researchers from DeepMind - Andrew Brock, Soham De, Samuel L Smith & Karen Simonyan. Thanks to their work, its now possible to train networks without normalization that reached state-of-art on ImageNet! Are normalizer-free networks going to be the new norm?