Skip to main content

Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks.

This report will try to explain the difference between 1D, 2D and 3D convolution in convolutional neural networks intuitively.
Created on August 10|Last edited on March 2

Introduction

In this report, we will clearly explain the difference between 1D, 2D, and 3D convolutions in CNNs in terms of the convolutional direction & output shape.

1D CNN


Fig 1: Operation of 1D CNN


Overview:

  • The convolutional kernel/filter moves in just one direction(say along time-axis) to calculate the output.
  • Output-shape is a 1D array.
  • Use case: Signal smoothing, Sentence Classification

Implementation:

Here' how we perform 1D convolution in TensorFlow 2.x.
import tensorflow as tf
import numpy as np

inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME') # Notice the use of 1D conv.

print(inp.shape, kernel.shape, out.shape)
We observe that the shapes are:
  • Input 1D vector - [batch size, width, in channels] (e.g. 2, 5, 1)
  • Convolutional kernel - [width, in channels, out channels] (e.g. 5, 1, 4)
  • Output Volume - [batch size, width, out_channels] (e.g. 2, 5, 4)

What might this look like in real life?

So let's understand what this is doing using a signal smoothing example. On the left, you got the original and on the right, you got the output of a Convolution 1D which has 3 output channels.
Fig 2: Left we have original signal, right we have output of 1D CNN


What do multiple channels mean?

Multiple channels are basically multiple feature representations of an input. In this e,xample you have three representations obtained by three different filters. The first channel is the equally-weighted smoothing filter. The second is a filter that weights the middle of the filter more than the boundaries. The final filter does the opposite of the second. So you can see how these different filters bring about different effects.
Fig 3: Visual representation of 1D convolutional kernel

2D CNN


Fig 4: Operation of 2D CNN


Overview:

  • The convolutional kernel moves in 2-direction (x,y) to calculate the convolutional output.
  • The output shape of the output is a 2D Matrix.
  • Use cases: Image Classification, Generating New Images, Image Inpainting, Image Colorization, etc.

Implementation

Here's we will perform 2D convolution in TensorFlow 2.x
import tensorflow as tf
import numpy as np
from PIL import Image

im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)

kernel_init = np.array(
[
[[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
[[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
[[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
])

kernel = tf.Variable(kernel_init, dtype=tf.float32)

out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')

What might this look like in real life?

Here you can see the output produced by the above code. The first image is the original and going clock-wise you have outputs of the 1st filter, 2nd filter, and 3 filters.
Fig 5: Original image with the output of three kernels

What do multiple channels mean?

In the context, if 2D convolution, it is much easier to understand what these multiple channels mean. Say you are doing face recognition. You can think of (this is a very unrealistic simplification but gets the point across) each filter represents an eye, mouth, nose, etc. So that each feature map would be a binary representation of whether that feature is there in the image you provided. I don't think I need to stress that for a face recognition model those are very valuable features. More information in this article.
This can be articulated through this:
Fig 6: Kernel designed to find eyes like feature in the image

3D CNN


Fig 7: Operation of 3D CNN


Overview

  • The convolutional kernel moves in 3-direction (x,y,z) to calculate the convolutional output.
  • Output-shape is 3D Volume
  • Use Case: Conv3D is mostly used with 3D image data such as Magnetic Resonance Imaging (MRI) or Computerized Tomography (CT) Scan.

Implementation

Here's how we can perform 3D convolution.
import keras
from keras.layers import Conv3D

model = keras.models.Sequential()

model.add(Conv3D(1, kernel_size=(3,3,3), input_shape = (128, 128, 128, 3)))

model.summary()
Fig 8: Visual representation of 3D Convolutional kernel


Summary

The following charts summarize the key differences between 1D, 2D, and 3D convolutional neural networks. Note that the input and output shapes are for TensorFlow.
Fig 9: Input shape for 1D, 2D, and 3D CNN in TensorFlow.

Fig 10: Output shape for 1D, 2D, and 3D CNN in TensorFlow.


Fig 11: Direction of operation for 1D, 2D, and 3D CNN in TensorFlow.