How to Handle Images of Different Sizes in a Convolutional Neural Network

Datasets come in all shapes and sizes. CNNs don't like that. Here's how to make it work (with code). Made by Ayush Thakur using Weights & Biases
Ayush Thakur


Convolutional neural networks require identical image sizes to work properly. Of course, in the real world, images are often not uniform. So how exactly do we solve that problem? Can we?


We can! In fact, we can do this in a number of different ways. Most of the techniques can be grouped into two broad classes of solutions, namely: transformations and inherent network properties.
Let's go through each of them one by one with simple implementation (wherever possible) in TensorFlow 2.x. We will use a Flower Dataset because it has variable-sized images and also we can spend this post looking at soothing pictures of flowers. That's a win-win.
Fig 1: Variable sized flower dataset

Transformation based techniques

In the case of variable-sized images, we can apply affine transformations to get the same-sized image. Some of these transformations are:
AUTO = = 256# load flower datasettrain_ds, validation_ds = tfds.load( "tf_flowers", split=["train[:85%]", "train[85%:]"], as_supervised=True)@tf.functiondef scale_resize_image(image, label): image = tf.image.convert_image_dtype(image, tf.float32) # equivalent to dividing image pixels by 255 image = tf.image.resize(image, (224, 224)) # Resizing the image to 224x224 dimention return (image, label)training_ds = ( train_ds .map(scale_resize_image, num_parallel_calls=AUTO) .batch(BATCH_SIZE) .prefetch(AUTO))
Here are our resized images (with a few additional examples as well)
Fig 2: Variable-sized images resized to 224x224 dimension.
AUTO = = 256@tf.functiondef scale(image, label): image = tf.image.convert_image_dtype(image, tf.float32) return (image, label)@tf.functiondef random_crop(images, labels): boxes = tf.random.uniform(shape=(len(images), 4)) box_indices = tf.random.uniform(shape=(len(images),), minval=0, maxval=BATCH_SIZE, dtype=tf.int32) images = tf.image.crop_and_resize(images, boxes, box_indices, (224,224)) return images, labelstrainloader = ( train_ds .map(scale_resize_image, num_parallel_calls=AUTO) .batch(BATCH_SIZE) .map(random_crop, num_parallel_calls=AUTO) .prefetch(AUTO))
Fig 3: Variable-sized images cropped and resized to 224x224 dimension.

Inherent Network Property

You can also look into networks that have an inherent property that's immune to the size of the input. Examples:


Though CNNs require uniform image sizes, there are a few fairly easy workarounds to take a dataset full of differently sized pictures and still run ML projects with that data. Broadly, you're going to want to look at data augmentation or transformation to create a dataset with identically-sized data or leverage FCNs or global average/max pooling. It's a small additional step but it makes dealing with messy real world data actually possible.

Weights & Biases

Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.