How to avoid checkerboard pattern in your generated images?
Issue
If you have trained a generative adversarial network to generate images of 64x64 resolution or more, you might have seen some weird visual artifacts.
These artifacts appear to be repetitive and have a "checkerboard" like pattern as shown in Figure 1. While many SOTA generative models exhibit such a pattern, this is more visual in strongly-colored generated images like the one shown in figure 1.
In this report, we will practically look at the reason behind the "checkerboard" pattern and ways to solve it.
TL;DR
If the generator of your GAN model or the decoder of your Autoencoder has the Deconvolution or Transposed Convolution layers then replace the same with Upsampling followed by Convolution layers to avoid "checkerboard" pattern.
The thing with Transposed Convolution
The generator in the GAN architecture is required to upsample input data in order to generate an output image. There are multiple techniques to upsample the low-resolution image. Some of them are nearest-neighbor interpolation, bilinear interpolation, bicubic interpolation, etc. However, they are based on heuristics and lacks the "learnable" aspect to it.
Transposed convolution, also known as deconvolution, is a popularly used layer to upsample low-resolution image to a higher resolution using learnable kernels. Thus this layer, both upsamples the input and learns how to fill in details during the model training process. It allows the model to use every point in the small image to “paint” a square in the larger one. For an excellent discussion of deconvolution check out is the deconvolution layer the same as a convolutional layer and checkerboard artifact free sub-pixel convolution.
Unfortunately, deconvolution can easily have "uneven overlaps" which appear as checkerboard artifacts. And this can be seen even with randomly initialized weights of the deconvolution layer as shown in figure 2.
-
Theoretically, these layers can learn weights that can overcome these artifacts but the thorough investigation has shown that deconvolution is naturally inclined towards such high-frequency artifacts.
-
The deconvolution layers are stacked on top of each other and it might happen that stacking them can cancel out artifacts. But that would be a very very carefully designed architecture, thus stacking them compounds the artifact effect.
How can we avoid such artifacts? What's a suitable alternative?
Solutions to the Issue
-
A straight forward solution is to use deconvolution without uneven overlap by ensuring that the kernel size is divisible by the stride.