Using Stable Diffusion VAE to encode satellite images

We take the pretrained Stable Diffusion Variable Auto Encoder (VAE) to represent satellite imagery into latent space.

Created on January 24|Last edited on March 9

Comment

﻿
from the great "Illustrated Stable Diffusion" ﻿article﻿
Visualizing encoded imagesWe encode our satellite images into latent space using Stable Diffusion VAE. Then we visualize the latents with a wandb.Table. Finally, we decode the latents back to image space, and surprisingly, we get back an almost lossless copy of the input.
This means that we can train latent-diffusion models using this technique, saving huges amount of compute. Also, the encoding can be done offline (before training the diffusion pipeline).The code to encode and decode the images is below:
from diffusers import AutoencoderKL
from PIL import Image
﻿
vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")
﻿
def encode_img(input_img):
    # Single image -> single latent in a batch (so size 1, 4, 64, 64)
    if len(input_img.shape)<4:
        input_img = input_img.unsqueeze(0)
    with torch.no_grad():
        latent = vae.encode(input_img*2 - 1) # Note scaling
    return 0.18215 * latent.latent_dist.sample()
﻿
﻿
﻿
def decode_img(latents):
    # bath of latents -> list of images
    latents = (1 / 0.18215) * latents
    with torch.no_grad():
        image = vae.decode(latents).sample
    image = (image / 2 + 0.5).clamp(0, 1)
    image = image.detach()
    return image
﻿
Run set2
﻿
﻿
﻿
﻿

Add a comment