The Sky Is In Our Grasp!

This report explores an interesting paper called Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos. . Made by Ayush Thakur using W&B
Ayush Thakur

You might have used Instagram or Snapchat filters. Cat whiskers that look cute and vampire teeth that look spooky. Have you ever wondered how it is possible? Well, a short technical answer is that facial landmarks are automatically detected, and components like whiskers are placed on the appropriate landmarks. Interesting!

How about replacing the sky with something of your choice? You want the video that you shot on a cloudy day to have the warmth of a sunny day. Alternatively, you want a cool thunderstorm effect in the background. In this report, we will explore a technique that enables dynamic sky replacement and harmonization.

Project Website | Paper | Colab Notebook


The sky is one of the vital components in outdoor photography as well as videography. The photographer usually has to deal with uncontrollable weather and lighting conditions, which leads to the overexposed or plain-looking sky. To overcome them, the photographer can use special hardware equipment that might not be affordable for everyone.

Software-based automatic sky editing is an affordable option, and recent computer vision advancements can benefit this space. Existing methods either require laborious and time-consuming manual work or have specific camera requirements. To overcome these issues, the authors of this paper proposed a new solution that can generate realistic and dramatic sky backgrounds in videos with controllable styles.

The proposed method, which we will overview in the next section is,

The video by the authors shows some excellent results produced by the proposed method.


Overview of the Proposed Method


The proposed method consists of three key components:

Sky Matting Network

Image matting plays an essential role in image and video editing and encompasses many methods to separate the foreground of interests from an image. The foreground, which in our case is everything except the sky, is separated by predicting a soft "matte".

Contrary to previous methods that rely on binary pixel-wise classification(foreground vs. sky), the proposed sky matting network produces soft sky matte for a more accurate detection result and a more visually pleasing blending effect.

The authors have used Deep Convolutional Neural Network based U-shaped network that consists of an encoder $E$ and a decoder $D$. This network predicts coarse sky matte. A coarse-to-fine refinement module takes in the course map and the high-resolution input frame to produce refined sky matte.

Details on U-shaped network


Details of the refinement module

Motion Estimation

This component is responsible for capturing the motion of the sky. Why is that necessary? You will want the sky video captured by the "virtual camera" to be rendered and synchronized under the real camera's motion.

Sky Image Blending


Now let us admire the awesomeness of this proposed technique. The authors have used the method for video augmentation(sky replacement) and weather/lighting translation. Let us look at both of them separately.

I have built the linked colab notebook using the one provided by the authors but have simplified it so that you can augment your own video easily.

Reproduce the Results on Colab Notebook $\rightarrow$

Video Sky Augmentation

Weather/Lightning Translation