The report explains the difference between ‘SAME’ and ‘VALID’ padding in tf.nn.max_pool of TensorFlow. Made by Krisha Mehta using Weights & Biases

In TensorFlow, `tf.nn.max_pool`

performs max pooling on the input. Max pooling is used to downsample the spatial dimension of input to reduce the number of parameters and computation needed to train the network. It does this by taking the maximum value.

`tf.nn.max_pool`

takes 6 arguments, namely input (rank N+2 tensor), ksize (size of the window for each dimension of the input tensor), strides (stride of the sliding window for each dimension of the input tensor), padding. The padding algorithm takes 2 values either **VALID** or **SAME** and the padding is performed by adding values to the input matrix. The value used for padding is always zero.

Let us see how these two are different from each other.

When `padding == ”VALID”`

, the input image is not padded. This means that the filter window always stays inside the input image. This type of padding is called *valid* because for this padding only the *valid* and original elements of the input image are considered. When `padding == "VALID"`

, there can be a loss of information. Generally, elements on the right and the bottom of the image tend to be ignored. How many elements are ignored depends on the size of the kernel and the stride.

In this case, the size of the output image <= the size of the input image.

If `padding == "VALID":`

output_spatial_shape[i] = ceil((input_spatial_shape[i] - (spatial_filter_shape[i]-1) * dilation_rate[i]) / strides[i])

When `padding == “SAME”`

, the input is half padded. The padding type is called SAME because the output size is the same as the input size(when stride=1). Using ‘SAME’ ensures that the filter is applied to all the elements of the input. Normally, padding is set to "SAME" while training the model. Output size is mathematically convenient for further computation.

If `padding == "SAME"`

:

output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

Finally, let's visualize the affect both padding options have on our convolutions:

The first figure in the image above shows `padding="VALID"`

where the filter is bounded by the input image.

So if we have an image with:

- Input Size = 4x4
- Kernel Size = 2x2
- Stride Size = 1x1

We can see we get an output image of size 2x2.

The second figure shows `padding="SAME"`

, where the size of the output image is equal to the input image (stride=1).

So if we have an image with:

- Input Size = 5x5
- Kernel Size = 5x5
- Stride Size = 1x1

We get an output image of size 5x5.