Skip to main content

Experimenting with EvoNorm Layers in TensorFlow 2

This article provides an experimental summary of the implementation of EvoNorm layers proposed in a prominent 'Evolving Normalization-Activation Layers'.
Created on April 19|Last edited on October 11

Table of Contents




Experimental Setup

In this report, I am going to lament on my experiments with the EvoNorm layers proposed in Evolving Normalization-Activation Layers. In the paper, the authors attempt to unify the normalization layers and activation functions into a single computation graph. The authors claim -
Several of these layers enjoy the property of being independent from the batch statistics.
I used Colab to perform my experiments. The authors tested the EvoNorm layers on MobileNetV2, ResNets, MnasNet, and EfficientNets. I decided to try out some quick experiments on a Mini Inception architecture as shown in this blog post. I trained them on the CIFAR10 dataset.


Run set
2

I am going to compare the EvoNorm B0 and S0 layers with respect to the following Mini Inception architecture:
  • Adam + BN-ReLU + No Data Augmentation
  • SGD + BN-ReLU + With Data Augmentation
  • SGD + BN-ReLU + Without Data Augmentation
(BN refers to Batch Normalization)
The EvoNorm authors refer to their layers as the EvoNorm-B series, as they involve Batch aggregations and hence require maintaining a moving average statistics for inference. The EvoNorm-S series refers to batch-independent layers that rely on individual samples only (a desirable property to simplify implementation and stabilize training with small batch sizes).
It should be also noted that the EvoNorm layers perform quite well in tasks like instance segmentatio_ with Mask R-CNN and image synthesis with BigGAN.


Adam + BN-ReLU + No Data Augmentation




Run set
1


SGD + BN-ReLU + No Data Augmentation

SGD params:
opt = tf.keras.optimizers.SGD(lr=1e-2, momentum=0.9, decay=1e-2 / EPOCHS)


Run set
1


SGD + BN-ReLU + Data Augmentation


Run set
1


EvoNorm B0 + No Data Augmentation


Run set
1


EvoNorm B0 + Data Augmentation


Run set
1


EvoNorm S0 + No Data Augmentation + Groups8



Run set
1

With EvoNorm B0 and no data augmentation in groups of 8, we again see that the validation loss, in this case, is higher than that of the previous experiment. The accuracies also differ from each other. The network is not generalizing well in this case either.
A note on the groups hyperparameter in the EvoNorm layers:
groups allow us to control how many data points should be used for group aggregation similar to what is used in group normalization. The authors show what groups work well as the task changes in the original paper.

EvoNorm S0 + No Data Augmentation + Groups16



Run set
1


EvoNorm S0 + No Data Augmentation + Groups32



Run set
1


Observations on EvoNom S0 layers without data augmentation

sweep_config = {
"method": "random",
"metric": {
"name": "accuracy",
"goal": "maximize"
},
"parameters": {
**"groups": {
"values": [4, 8, 12, 16, 32]**
},
"epochs": {
"values": [10, 20, 30, 40, 50, 60]
},
"learning_rate": {
"values": [1e-2, 1e-3, 1e-4, 3e-4, 3e-5, 1e-5]
},
"optimizer": {
'values': ["adam", "sgd"]
}
}
}

SGD + BN-ReLU + Data Augmentation shows the most stable training behavior so far.
If we look closely, all EvoNorm S0 experiments (except groups of 32) without data augmentation show stable training behavior up until ~12 epochs.
This is the case for EvoNorm B0 + No Data Augmentation as well.
One thing that might help here is tuning the learning rates and groups hyperparameters more.
This is why I decided to run a [hyperparameter sweep](https://docs.wandb.com/sweeps) with the search space that you can see besides.

Hyperparameter sweep on EvoNorm S0 layers without data augmentation




Run set
9


EvoNorm S0 + Data Augmentation + Groups8



Run set
3



Final remarks

As we saw for this quick experimental setup, EvoNorm layers fail to match the performance of BN-ReLU. But this should not be treated as a foregone conclusion. I encourage you to try the EvoNorm layers out in your own experiments and let me know via Twitter (@RisingSayak) what you find.

👉 Colab notebook to reproduce results.

Acknowledgement

Hanxiao Liu (first author of the paper) for helping me correct the implementation.

Iterate on AI agents and models faster. Try Weights & Biases today.