Open In Colab

Image colorization is an ill-posed problem, i.e., there are multiple plausible choices to colorize an object. A black and white car can either be colored red, blue, or gray. This report explores an interesting deep learning framework to achieve instance-aware colorization.

Paper | Code | Google Colab

Upload black and white image(s) and download the colored images in the linked Colab notebook.


Image colorization is a fascinating deep learning task to automatically predict the missing channels from a given single-channel grayscale image. There exist many plausible ways to color a grayscale image, which makes this a challenging problem statement. It is also a prevalent pretext task for image representation learning. Learn more about it in the Unsupervised Visual Representation Learning with SwAV report.

Some of the existing image colorization techniques include:


-> Figure 1: A clear separation of the object(orange) and background leads to a more convincing colorization. (Source) <-

Try out the colab notebook to colorize your grayscale images.

Open In Colab

Overview of the Proposed Method


The proposed instance-aware colorization method is different from existing learning-based methods and is more effective due to the following reasons:

Instance level colorization must be tied to entire image colorization. Let us look at the architectural design of this framework.



-> Figure 2: Overview of the proposed method. (Source) <-

The network architecture consists of three components:

Object Detection


-> Figure 3: Object detection pipeline of the InstColorization framework. <-

The framework takes in a grayscale image $X$ as input and predicts two remaining color channels in the CIE Lab color space. This color space describes all the colors visible to the human eye. It was created to serve as a device-independent model to be used as a reference. Thus InstColorization framework is device agnostic.

As shown in figure 3, an off-the-shelf pre-trained mask R-CNN object detector is used to detect the instances in the image. The instances are cropped using the obtained bounding box coordinates. They are then resized to 256 x 256 resolution images.

For training, a colored image dataset is used. The images are converted to CIE Lab color space. Only the $L$ channel is used, and the rest two are discarded. An object detector is used on this single-channel image. The instances are cropped using the bounding box coordinates from the single-channel image and the colored counterpart.

Image Colorization Backbone


-> Figure 4: Network architecture contains two branches of colorization networks, one for colorizing the instance images and the other for colorizing the full image. <-

The instance image($X_i$) and the input grayscale image($X$) are fed to the instance colorization network and full-image colorization network respectively. Both networks share the same architecture but different weights.

The networks are similar so that they have the same number of layers to facilitate feature fusion. The authors have used the main colorization network introduced in Real-Time User-Guided Image Colorization with Learned Deep Priors.

For training, the full-colorization network is trained first. The trained weights are transferred to initialize the weights of the instance-colorization network.

Fusion Module


-> Figure 5: The feature fusion module. (Source) <-

To produce accurate and coherent colorization, the authors proposed a fusion module, which can blend the immediate feature maps from both instance and full-image networks. One can think of naively overlaying the feature maps to blend it, but that leads to visible artifacts due to the overlapping pixels' inconsistency.

The fusion takes place at multiple layers of the colorization network. For the sake of simplicity, let us discuss this module for the $j^{th}$ layer. Just a reminder that the feature map from both the networks at the $j^{th}$ layer will have the same shape.

Figure 5 summarizes the fusion module.


Open In Colab

Before we get into the training details, let's look at some of the results.

Training Procedure

Training Dataset

The authors have used ImageNet and COCO-Stuff datasets to train and evaluate the model. They have additionally used the Places205 dataset to evaluate the model on out-of-distribution data samples.

Evaluation Metrics

The authors have used PSNR and SSIM to quantify the colorization quality. They have also used perceptual metric LPIPS.

Training Details

The whole network is trained sequentially in a three-step training process:

Ending Note

The field of image colorization is exciting and challenging. We have seen much progress in recent times, and it will get better with new papers.

This work by the authors is promising. The insight that a clear figure-ground separation can dramatically improve colorization performance can be seen in the results. This work leverages an off-the-shelf object detection model. Thus better detectors will improve colorizations.

In my opinion, the novel bit of instance-aware colorization is because it can use any existing learning-based colorization architecture as the backbone architecture for full-image and instance colorization networks.

I hope you find this summary insightful and will encourage you to read the paper. Leave your thoughts in the comment down below. If you have any questions, I would love to address it.