EvoNorm layers in TensorFlow 2

Experimental summary of my implementation of EvoNorm layers proposed in https://arxiv.org/pdf/2004.02967.pdf. Made by Sayak Paul using Weights & Biases
Sayak Paul

Experimental setup

In this report, I am going to lament on my experiments with the EvoNorm layers proposed in Evolving Normalization-Activation Layers. In the paper, the authors attempt to unify the normalization layers and activation functions into a single computation graph. The authors claim -

Several of these layers enjoy the property of being independent from the batch statistics.

I used Colab to perform my experiments. The authors tested the EvoNorm layers on MobileNetV2, ResNets, MnasNet, and EfficientNets. I decided to try out some quick experiments on a Mini Inception architecture as shown in this blog post. I trained them on the CIFAR10 dataset.

👉 GitHub repo to reproduce results.

Experimental setup

Adam + BN-ReLU + No Data Augmentation

Adam + BN-ReLU + No Data Augmentation

SGD + BN-ReLU + No Data Augmentation

SGD params:

opt = tf.keras.optimizers.SGD(lr=1e-2, momentum=0.9, decay=1e-2 / EPOCHS)

SGD + BN-ReLU + No Data Augmentation

SGD + BN-ReLU + Data Augmentation

SGD + BN-ReLU + Data Augmentation

EvoNorm B0 + No Data Augmentation

EvoNorm B0 + No Data Augmentation

EvoNorm B0 + Data Augmentation

EvoNorm B0 + Data Augmentation

EvoNorm S0 + No Data Augmentation + Groups8

EvoNorm S0 + No Data Augmentation + Groups8

EvoNorm S0 + No Data Augmentation + Groups16

EvoNorm S0 + No Data Augmentation + Groups16

EvoNorm S0 + No Data Augmentation + Groups32

EvoNorm S0 + No Data Augmentation + Groups32

Observations on EvoNom S0 layers without data augmentation

Observations on EvoNom S0 layers without data augmentation

Hyperparameter sweep on EvoNorm S0 layers without data augmentation

Hyperparameter sweep on EvoNorm S0 layers without data augmentation

EvoNorm S0 + Data Augmentation + Groups8

EvoNorm S0 + Data Augmentation + Groups8

Final remarks

As we saw for this quick experimental setup, EvoNorm layers fail to match the performance of BN-ReLU. But this should not be treated as a foregone conclusion. I encourage you to try the EvoNorm layers out in your own experiments and let me know via Twitter (@RisingSayak) what you find.

👉 Colab notebook to reproduce results.

Acknowledgement

Hanxiao Liu (first author of the paper) for helping me correct the implementation.