This informal report describes short experiments comparing ImageNet21k and ImageNet1k on a steganalysis task. Made by Yassine Yousfi using Weights & Biases

Steganography is the art of covert communication when secrets are hidden in ordinary-looking cover objects, in this case JPEG images. Typically, secret messages are embedded in JPEG cover images by performing +/- 1 on the quantized DCT coefficients.

Steganalysis is the detection of these steganographic messages. It is usually formulated as a binary classification problem, where given a set of cover and stego images, the security is determined by how easily a knowledgable steganalyst can classify the 2 classes (c.f. Kerckhoffs's principle).

Below, you'll find an example of a cover/stego pair. Notice how the two images are perfectly identical visually (and statistically), while the stego image contains a simulated message of about 1 kbits. Also, notice how DCT blocks in the clear sky region have less changes than those in the more complex regions of the image, this is called content-adaptive steganography.

ImageNet21k, meanwhile, is a larger version of ImageNet. It contains around 12.4M images (after cleaning), much more than the ImageNet dataset has around 1.3M images. More details about ImageNet21k can be found in [1].

In [1], the authors show that for certain downstream tasks, pretraining on ImageNet21k achieves better performance than pretraining on ImageNet1k. In this report, we study the effect of pretraining on a steganalysis task. We compare a few different pretraining strategies of the same architectures:

- ImageNet21k
- ImageNet1k
- JIN (J-UNIWARD ImageNet) [2]

(Note that the last pretraining strategy is a rather unfair comparison to the first two. JIN is a steganalysis-specific dataset and is very semantically close to the downstream task.)

Some additional details:

- We use EfficientNet V2 L [3] as our CNN architecture.
- We perform transfer learning with the same hyper-parameters described in Section 4.2 of [9] with the only difference of using the OneCycle LR scheduler with a linear warmup of 4 epochs.
- We compare the two pretrained models on a regular training schedule (60 epochs) and a short schedule (20 epochs)

And finally, we use the following performance metrics to compare the detectors:

P_{\mathrm{E}} =\min(P_{\mathrm{\mathrm{D}}}(P_{\mathrm{FA}})+P_{\mathrm{FA}})\\
\text{MD5}=P_{\text{MD}}(P_{\text{FA}}=0.05)\\
\text{wAUC}=\intop_{0}^{1}w(P_{\mathrm{\mathrm{D}}}(P_{\mathrm{FA}}))P_{\mathrm{\mathrm{D}}}(P_{\mathrm{FA}})\mathrm{d}P_{\mathrm{FA}}

Where P_\mathrm{D},P_\mathrm{FA} are the probability of detection and probability of false alarm on the validation set. wAUC is a weighted AUC, w(P_{\mathrm{\mathrm{D}}}) is a weighting function giving the area of P_\mathrm{D} between 0 and 0.4 is given a weight of 2. The wAUC was the scoring metric for the ALASKA II kaggle [10], while the MD5 was the scoring metric for the ALASKA I challenge [11].

We also compare the two definitions of the EfficientNet v2 available in the timm library:

- An unofficial model definition trained by Ross Wightman before the official code release
- The ported model from the official tensorflow implementation

They are both pretrained on ImageNet1k. We use the same setting as above.

Noticeable differences: RW_M has wider stem (32 vs 24) + wider FC (2152 vs 1280) + slightly slower training.

We show that ImageNet21k is only slightly better (<1%) than ImageNet1k on this dataset, for both long and short training time. JIN is by quite a margin better than both datasets, but this is expectable and reported on in [2].

Thanks to Ross Wightman's amazing library timm which includes these 2 pretrained models which made this little report possible.

[1] Ridnik, T., Ben-Baruch, E., Noy, A. and Zelnik-Manor, L., 2021. ImageNet-21K Pretraining for the Masses. arXiv preprint arXiv:2104.10972.

[2] Butora, J., Yousfi, Y. and Fridrich, J., 2021, June. How to pretrain for steganalysis. In The 9th ACM Workshop on Information Hiding and Multimedia Security, Brussels, Belgium.

[3] Tan, M. and Le, Q.V., 2021. Efficientnetv2: Smaller models and faster training. arXiv preprint arXiv:2104.00298.

[4] Bas, P., Filler, T. and PevnĂ˝, T., 2011, May. Break our steganographic system: the ins and outs of organizing BOSS. In International workshop on information hiding (pp. 59-70). Springer, Berlin, Heidelberg.

[6] Holub, V., Fridrich, J. and Denemark, T., 2014. Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security, 2014(1), pp.1-13.

[7] Cogranne, R., Giboulot, Q. and Bas, P., 2020, June. Steganography by minimizing statistical detectability: The cases of JPEG and color images. In Proceedings of the 2020 ACM Workshop on Information Hiding and Multimedia Security (pp. 161-167).

[8] Guo, L., Ni, J. and Shi, Y.Q., 2014. Uniform embedding for efficient JPEG steganography. IEEE transactions on Information Forensics and Security, 9(5), pp.814-825.

[9] Yousfi, Y., Butora, J., Fridrich, J. and Tsang, C.F., 2021. Improving EfficientNet for JPEG Steganalysis.

[10] Cogranne, R., Giboulot, Q. and Bas, P., 2020, December. ALASKA# 2: Challenging Academic Research on Steganalysis with Realistic Images. In 2020 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 1-5). IEEE.

[11] Cogranne, R., Giboulot, Q. and Bas, P., 2019, July. The ALASKA steganalysis challenge: A first step towards steganalysis. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security (pp. 125-137).