Skip to main content

Using Stable Diffusion VAE to encode satellite images

We take the pretrained Stable Diffusion Variable Auto Encoder (VAE) to represent satellite imagery into latent space.
Created on January 24|Last edited on March 9

Visualizing encoded images

We encode our satellite images into latent space using Stable Diffusion VAE. Then we visualize the latents with a wandb.Table. Finally, we decode the latents back to image space, and surprisingly, we get back an almost lossless copy of the input.
This means that we can train latent-diffusion models using this technique, saving huges amount of compute. Also, the encoding can be done offline (before training the diffusion pipeline).
The code to encode and decode the images is below:
from diffusers import AutoencoderKL
from PIL import Image

vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")

def encode_img(input_img):
# Single image -> single latent in a batch (so size 1, 4, 64, 64)
if len(input_img.shape)<4:
input_img = input_img.unsqueeze(0)
with torch.no_grad():
latent = vae.encode(input_img*2 - 1) # Note scaling
return 0.18215 * latent.latent_dist.sample()



def decode_img(latents):
# bath of latents -> list of images
latents = (1 / 0.18215) * latents
with torch.no_grad():
image = vae.decode(latents).sample
image = (image / 2 + 0.5).clamp(0, 1)
image = image.detach()
return image

Run set
2