Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks.
This report will try to explain the difference between 1D, 2D and 3D convolution in convolutional neural networks intuitively.
Created on August 10|Last edited on March 2
Comment
Introduction
In this report, we will clearly explain the difference between 1D, 2D, and 3D convolutions in CNNs in terms of the convolutional direction & output shape.
1D CNN

Fig 1: Operation of 1D CNN
Overview:
- The convolutional kernel/filter moves in just one direction(say along time-axis) to calculate the output.
- Output-shape is a 1D array.
Implementation:
Here' how we perform 1D convolution in TensorFlow 2.x.
import tensorflow as tfimport numpy as npinp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME') # Notice the use of 1D conv.print(inp.shape, kernel.shape, out.shape)
We observe that the shapes are:
- Input 1D vector - [batch size, width, in channels] (e.g. 2, 5, 1)
- Convolutional kernel - [width, in channels, out channels] (e.g. 5, 1, 4)
- Output Volume - [batch size, width, out_channels] (e.g. 2, 5, 4)
What might this look like in real life?
So let's understand what this is doing using a signal smoothing example. On the left, you got the original and on the right, you got the output of a Convolution 1D which has 3 output channels.

Fig 2: Left we have original signal, right we have output of 1D CNN
What do multiple channels mean?
Multiple channels are basically multiple feature representations of an input. In this e,xample you have three representations obtained by three different filters. The first channel is the equally-weighted smoothing filter. The second is a filter that weights the middle of the filter more than the boundaries. The final filter does the opposite of the second. So you can see how these different filters bring about different effects.

Fig 3: Visual representation of 1D convolutional kernel
2D CNN

Fig 4: Operation of 2D CNN
Overview:
- The convolutional kernel moves in 2-direction (x,y) to calculate the convolutional output.
- The output shape of the output is a 2D Matrix.
- Use cases: Image Classification, Generating New Images, Image Inpainting, Image Colorization, etc.
Implementation
Here's we will perform 2D convolution in TensorFlow 2.x
import tensorflow as tfimport numpy as npfrom PIL import Imageim = np.array(Image.open(<some image>).convert('L'))#/255.0x = np.expand_dims(np.expand_dims(im,0),-1)kernel_init = np.array([[[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],[[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],[[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]])kernel = tf.Variable(kernel_init, dtype=tf.float32)out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')
What might this look like in real life?
Here you can see the output produced by the above code. The first image is the original and going clock-wise you have outputs of the 1st filter, 2nd filter, and 3 filters.

Fig 5: Original image with the output of three kernels
What do multiple channels mean?
In the context, if 2D convolution, it is much easier to understand what these multiple channels mean. Say you are doing face recognition. You can think of (this is a very unrealistic simplification but gets the point across) each filter represents an eye, mouth, nose, etc. So that each feature map would be a binary representation of whether that feature is there in the image you provided. I don't think I need to stress that for a face recognition model those are very valuable features. More information in this article.
This can be articulated through this:

Fig 6: Kernel designed to find eyes like feature in the image
3D CNN

Fig 7: Operation of 3D CNN
Overview
- The convolutional kernel moves in 3-direction (x,y,z) to calculate the convolutional output.
- Output-shape is 3D Volume
- Use Case: Conv3D is mostly used with 3D image data such as Magnetic Resonance Imaging (MRI) or Computerized Tomography (CT) Scan.
Implementation
Here's how we can perform 3D convolution.
import kerasfrom keras.layers import Conv3Dmodel = keras.models.Sequential()model.add(Conv3D(1, kernel_size=(3,3,3), input_shape = (128, 128, 128, 3)))model.summary()

Fig 8: Visual representation of 3D Convolutional kernel
Summary
The following charts summarize the key differences between 1D, 2D, and 3D convolutional neural networks. Note that the input and output shapes are for TensorFlow.

Fig 9: Input shape for 1D, 2D, and 3D CNN in TensorFlow.

Fig 10: Output shape for 1D, 2D, and 3D CNN in TensorFlow.

Fig 11: Direction of operation for 1D, 2D, and 3D CNN in TensorFlow.
Add a comment