Is ImageNet21k a Better Dataset for Transfer Learning in Steganalysis?

This informal report describes short experiments comparing ImageNet21k and ImageNet1k on a steganalysis task. Made by Yassine Yousfi using Weights & Biases
Yassine Yousfi
Steganography is the art of covert communication when secrets are hidden in ordinary-looking cover objects, in this case JPEG images. Typically, secret messages are embedded in JPEG cover images by performing +/- 1 on the quantized DCT coefficients.
Steganalysis is the detection of these steganographic messages. It is usually formulated as a binary classification problem, where given a set of cover and stego images, the security is determined by how easily a knowledgable steganalyst can classify the 2 classes (c.f. Kerckhoffs's principle).
Below, you'll find an example of a cover/stego pair. Notice how the two images are perfectly identical visually (and statistically), while the stego image contains a simulated message of about 1 kbits. Also, notice how DCT blocks in the clear sky region have less changes than those in the more complex regions of the image, this is called content-adaptive steganography.
ImageNet21k, meanwhile, is a larger version of ImageNet. It contains around 12.4M images (after cleaning), much more than the ImageNet dataset has around 1.3M images. More details about ImageNet21k can be found in [1].
In [1], the authors show that for certain downstream tasks, pretraining on ImageNet21k achieves better performance than pretraining on ImageNet1k. In this report, we study the effect of pretraining on a steganalysis task. We compare a few different pretraining strategies of the same architectures:
(Note that the last pretraining strategy is a rather unfair comparison to the first two. JIN is a steganalysis-specific dataset and is very semantically close to the downstream task.)
Some additional details:
And finally, we use the following performance metrics to compare the detectors:
P_{\mathrm{E}} =\min(P_{\mathrm{\mathrm{D}}}(P_{\mathrm{FA}})+P_{\mathrm{FA}})\\ \text{MD5}=P_{\text{MD}}(P_{\text{FA}}=0.05)\\ \text{wAUC}=\intop_{0}^{1}w(P_{\mathrm{\mathrm{D}}}(P_{\mathrm{FA}}))P_{\mathrm{\mathrm{D}}}(P_{\mathrm{FA}})\mathrm{d}P_{\mathrm{FA}}
Where P_\mathrm{D},P_\mathrm{FA} are the probability of detection and probability of false alarm on the validation set. wAUC is a weighted AUC, w(P_{\mathrm{\mathrm{D}}}) is a weighting function giving the area of P_\mathrm{D} between 0 and 0.4 is given a weight of 2. The wAUC was the scoring metric for the ALASKA II kaggle [10], while the MD5 was the scoring metric for the ALASKA I challenge [11].

RW EfficientNets vs ported TF EfficientNets

We also compare the two definitions of the EfficientNet v2 available in the timm library:
They are both pretrained on ImageNet1k. We use the same setting as above.
Noticeable differences: RW_M has wider stem (32 vs 24) + wider FC (2152 vs 1280) + slightly slower training.

Conclusions

We show that ImageNet21k is only slightly better (<1%) than ImageNet1k on this dataset, for both long and short training time. JIN is by quite a margin better than both datasets, but this is expectable and reported on in [2].
Thanks to Ross Wightman's amazing library timm which includes these 2 pretrained models which made this little report possible.

References

[1] Ridnik, T., Ben-Baruch, E., Noy, A. and Zelnik-Manor, L., 2021. ImageNet-21K Pretraining for the Masses. arXiv preprint arXiv:2104.10972.
[2] Butora, J., Yousfi, Y. and Fridrich, J., 2021, June. How to pretrain for steganalysis. In The 9th ACM Workshop on Information Hiding and Multimedia Security, Brussels, Belgium.
[3] Tan, M. and Le, Q.V., 2021. Efficientnetv2: Smaller models and faster training. arXiv preprint arXiv:2104.00298.
[4] Bas, P., Filler, T. and PevnĂ˝, T., 2011, May. Break our steganographic system: the ins and outs of organizing BOSS. In International workshop on information hiding (pp. 59-70). Springer, Berlin, Heidelberg.
[5] https://bows2.ec-lille.fr/
[6] Holub, V., Fridrich, J. and Denemark, T., 2014. Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security, 2014(1), pp.1-13.
[7] Cogranne, R., Giboulot, Q. and Bas, P., 2020, June. Steganography by minimizing statistical detectability: The cases of JPEG and color images. In Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security (pp. 161-167).
[8] Guo, L., Ni, J. and Shi, Y.Q., 2014. Uniform embedding for efficient JPEG steganography. IEEE transactions on Information Forensics and Security, 9(5), pp.814-825.
[9] Yousfi, Y., Butora, J., Fridrich, J. and Tsang, C.F., 2021. Improving EfficientNet for JPEG Steganalysis.
[10] Cogranne, R., Giboulot, Q. and Bas, P., 2020, December. ALASKA# 2: Challenging Academic Research on Steganalysis with Realistic Images. In 2020 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 1-5). IEEE.
[11] Cogranne, R., Giboulot, Q. and Bas, P., 2019, July. The ALASKA steganalysis challenge: A first step towards steganalysis. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security (pp. 125-137).