3D Image Inpainting

A novel way to convert a single RGB-D image into a 3D image. Made by Ayush Thakur using Weights & Biases
Ayush Thakur

In this report, we introduce some key components of the [3D Photography using Context-aware Layered Depth Inpainting] (https://shihmengli.github.io/3D-Photo-Inpainting/) paper and look at intermediate results alongside stunning 3D images. Since we can see the Neowise comet this month, the report is space-themed and tries to bring images from space into life.

Reproduce results in this colab $\rightarrow$

Alternatively, you can use the forked repo which lets you visualize your model predictions in Weights & Biases here $\rightarrow$


3D pictures can take your photography to a whole new dimension. However, creating such parallax effects with classical reconstruction and rendering techniques requires elaborate setup and specialized hardware, which is not always feasible.
Depth is the most important aspect of 3D photography. A 3D image can be created by taking two shots of the same scene, where one is a little offset to the other. This slight difference is enough to trick your brain into thinking you are looking at an image with depth. Recent advancements in cell phone cameras, like dual-lens camera, enable capturing depth information. The resulting image is an RGB-D(color and depth) image. In an attempt to generate a lifelike view from this RGB-D image, occlusions created by parallax must be nullified.
In this paper, the authors have proposed a method of converting a single image into a 3D photo. They have used Layered Depth Image(LDI) as an underlying representation and have presented a learning-based inpainting model that can synthesize new color and depth in the occluded region.

The Paper \rightarrow

Here is a short video from the authors showcasing their stunning results.

Overview of the Paper

Some prerequisites that the proposed method relies on are:

With some context, we can now look into some details of this paper. This paper is rich with techniques that require narrow attention, but in my opinion, the best way to summarize this paper is to go through the proposed method and look at individual components along the way.

Method Overview

In the example below, the input RGB image is to the right, followed by the depth map estimated by the pre-trained depth estimation model.

Notice the blurred edges. A bilateral median filter is used to sharpen them, followed by a few more preprocessing steps.

image.png Figure 1: Steps showing one iteration of inpainting. (Source

image.png Figure 2: The impainting network. (Source)

Now let us dive into the exciting part.


Reproduce results in this colab $\rightarrow$

Alternatively, you can use the forked repo, which lets you visualize your model predictions in Weights & Biases as we have done here$\rightarrow$.

After successful 3D image inpainting, in the google colab, you will find image_name.ply file in the mesh directory. It is the inpainted 3D mesh generated by integrating all the inpainted depth and color values back into the original LDI. I was curious to look at this mesh. With some investigation, I realized that it is a Point Cloud. Being new to this, the easiest way for me to visualize this was to log this point cloud in the wandb dashboard using wandb.log. Learn more about doing this here. I used Open3D to load the .ply file. The point cloud object has depth and color information. They are shown below. However, I do not know how to interpret them. Nevertheless, curios endeavor.

How Were the Models Trained?

Training Data

The authors generated training data out of the MS COCO dataset. They first applied the pre-trained depth estimation model on the COCO dataset to obtain depth maps. They then extracted context/synthesis regions(briefly explained above) to form a pool of these regions. They were randomly sampled and were placed on different images in the MS COCO dataset. Out of at most three such regions per image, one was selected using training.


Check out the supplementary material of the paper for more information on training details.

It is time for some beautiful 3D images. :fire:

What's Next and Conclusion

The goal of this report is to summarize the paper, making it more accessible for the readers. I have used lines from the paper at places because that was the best way to convey the information.

Few of the thing that excited me the most about the proposed method are as follows:

Thank you for your time. For constructive feedback on summarizing this paper, reach out to me on Twitter, @ayushthakur0.