In this report, I am going to lament on my experiments with the EvoNorm layers proposed in Evolving Normalization-Activation Layers. In the paper, the authors attempt to unify the normalization layers and activation functions into a single computation graph. The authors claim -
Several of these layers enjoy the property of being independent from the batch statistics.
I used Colab to perform my experiments. The authors tested the EvoNorm layers on MobileNetV2, ResNets, MnasNet, and EfficientNets. I decided to try out some quick experiments on a Mini Inception architecture as shown in this blog post. I trained them on the CIFAR10 dataset.
SGD params:
opt = tf.keras.optimizers.SGD(lr=1e-2, momentum=0.9, decay=1e-2 / EPOCHS)
As we saw for this quick experimental setup, EvoNorm layers fail to match the performance of BN-ReLU. But this should not be treated as a foregone conclusion. I encourage you to try the EvoNorm layers out in your own experiments and let me know via Twitter (@RisingSayak) what you find.
Hanxiao Liu (first author of the paper) for helping me correct the implementation.