What's the best data representation and the effect of Mixup?

Experiments to answer some key questions for the SETI Breakthrough Listen - E.T. Signal Search Kaggle competition.
Created on June 9|Last edited on March 24
Comment
﻿
IntroductionWhat's the best way to use the cadence snippet? Should we use it channel-wise or spatially? Should we use all 6 spectrograms or just the ones with aliens' signal?
Mixup is giving a significant performance boost. But what's the gain in percentage? How much are the models trained with Mixup dependent on random initialization?
Experimentation SetupFramework: TensorFlow
Data: 5000 examples randomly shuffled. The resulting data distribution is close to the full training data distribution.    
Backbone architecture: EfficientNetB0
Trained for: Each experiment was run 3 times to get the mean and standard deviation.
Regularization: Trained with early stopping with the patience of 5 epochs. 
Other Configs: Adam optimizer was used. Refer to figure 1 for more config settings.
All the configs can be found below. (Please ignore the seed value)
﻿
Run set1
﻿
Best way to use the Cadence snippet?Channel-wise arrangement of spectrograms means stacking them on-top-of-each other. For 6 spectrograms arranged channel-wise the resulting shape would be (Height, Width, 6).
Spatial arrangement of spectrograms means stacking them side-by-side. For 6 spectrograms arranged spatially, the resulting shape would be (Height, Width*6, 1).
Arrange all 6 spectrograms channel-wise vs spatiallyNote: The 6 channel image was reduced to 3 channels by using a Conv2D layer. This was then fed to the backbone model.
Spatial arrangement gives a gain of ~6% for the validation ROC-AUC metric.
﻿
Run set10
﻿
Arrange only target spectrograms channel-wise vs spatiallyNote: Target spectrograms contain alien signals.
We can clearly see a significant improvement in the channel-wise arrangement when only target spectrograms are used.
There is about ~1% improvement in the spatial arrangement when only target spectrograms are used.
However, note that the standard deviation is much higher with just target spectrograms arranged spatially.
﻿
Run set16
﻿
Normalize individual spectrograms vs clip pixels and then normalizeNote: The code snippet below shows the difference between image-level normalization vs clip and then normalize.
# Normalize
data = ((data - np.mean(data, axis=0)) / np.std(data, axis=0))
﻿
# Clip 
data = ((np.clip(data, -1, 3) + 1) / 4 * 255).astype(np.uint8)
# Normalize
data = tf.image.convert_image_dtype(data, tf.float32)
Clearly, the standard deviation reduced.
There is a ~1% improvement in the score.    
So far, the target-only spectrograms arranged spatially with clip and then normalize gave the best mean ROC-AUC score with the least standard deviation in the score.
﻿
Run set12
﻿
Mixup AugmentationMixup augmentation is used by almost every team in this competition. And the reason will be obvious from the results of my experiments. By the way, if you want to learn more about Mixup (Cutmix, Augmix, etc) here's a blog post that I have written.
There's a huge ~7% gain in the score compared to the best score in the previous section.
The standard deviation is also better compared to the previous best.
The model trained with Mixup trains for a longer period of time and offers strong regularization. 
﻿
Run set7
﻿
ConclusionUse spatial arrangement of spectrograms to get the best single-model or k-fold models.
Use Mixup augmentation. 
﻿
Add a comment