Difference Between ‘SAME’ and ‘VALID’ Padding in TensorFlow

The report explains the difference between ‘SAME’ and ‘VALID’ padding in tf.nn.max_pool of TensorFlow. Made by Krisha Mehta using Weights & Biases
Krisha Mehta

In TensorFlow, tf.nn.max_pool performs max pooling on the input. Max pooling is used to downsample the spatial dimension of input to reduce the number of parameters and computation needed to train the network. It does this by taking the maximum value.

tf.nn.max_pool takes 6 arguments, namely input (rank N+2 tensor), ksize (size of the window for each dimension of the input tensor), strides (stride of the sliding window for each dimension of the input tensor), padding. The padding algorithm takes 2 values either VALID or SAME and the padding is performed by adding values to the input matrix. The value used for padding is always zero.

Let us see how these two are different from each other.

VALID

When padding == ”VALID”, the input image is not padded. This means that the filter window always stays inside the input image. This type of padding is called valid because for this padding only the valid and original elements of the input image are considered. When padding == "VALID", there can be a loss of information. Generally, elements on the right and the bottom of the image tend to be ignored. How many elements are ignored depends on the size of the kernel and the stride.

In this case, the size of the output image <= the size of the input image.

If padding == "VALID":

output_spatial_shape[i] = ceil((input_spatial_shape[i] - (spatial_filter_shape[i]-1) * dilation_rate[i]) / strides[i])

SAME

image.png

When padding == “SAME”, the input is half padded. The padding type is called SAME because the output size is the same as the input size(when stride=1). Using ‘SAME’ ensures that the filter is applied to all the elements of the input. Normally, padding is set to "SAME" while training the model. Output size is mathematically convenient for further computation.

If padding == "SAME":

output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

Visualize padding with an example

Finally, let's visualize the affect both padding options have on our convolutions:

0rs9l.gif Image Source

The first figure in the image above shows padding="VALID" where the filter is bounded by the input image.

So if we have an image with:

We can see we get an output image of size 2x2.

The second figure shows padding="SAME", where the size of the output image is equal to the input image (stride=1).

So if we have an image with:

We get an output image of size 5x5.