Paper Reading Group: Nf-ResNets

The paper reading groups are supported by experiments, blogs & code implementation! This is your chance to come talk about the paper that interests you!.
Andrea Pessl
After an insightful discussion on Revisiting ResNets together with Aravind Srinivas, join Aman Arora from Weights & Biases for the 3rd of the 4 papers from our paper reading group series on computer vision:

Characterizing Signal Propagation to Close the Performance Gap in Unnormalized ResNets [paper, blog]

NF-ResNets - June 8
In this paper, the authors seek to establish a general recipe for training deep ResNets without normalization layers which achieve test accuracies competitive with state of the art! Batch Normalization (BatchNorm) has been key in advancing deep learning research in computer vision, but, in the past few years, a new line of research has emerged that seeks to eliminate layers which normalize activations entirely.
We will continue with:

Register here for June 8, 9am PT / 6pm CET / 9:30pm IST

This is your chance to ask your burning questions!
Comment below any questions that you'd like to be answered as part of our next Paper Reading Group.

In the final session of our PRG series on 4 CV papers we will cover:

EfficientNetV2: Smaller Models and Faster Training - June 29

After the massive success of the EfficientNet architecture, Mingxin Tan and Quoc V. Le have done it again! This time they have come up with a new family of networks that have faster training speed and better parameter efficiency - EfficientNetV2! Would these networks have the same success as EfficientNets? Most probably, yes!

For comments on our previous PRG from May 25 see this report:

Revisiting ResNets: Improved Training and Scaling Strategies

With over 63,000 citations, ResNets have been at the forefront of research in Computer Vision (CV) models even today. Most recent CV papers compare their results to ResNets to showcase improvements in accuracy or speed or both.
❓: But, do such improvements on ImageNet top-1 accuracy come from model architectures or improved training and scaling strategies?
This is precisely the question that Bello et al try to answer in their recent paper Revisiting ResNets: Improved Training and Scaling Strategies.

For comments on our previous Paper Reading Group from May 9 see this report:

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In this paper, Dosovitskiy et al show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. The paper summary can be found here.