Using Stable Diffusion VAE to encode satellite images
We take the pretrained Stable Diffusion Variable Auto Encoder (VAE) to represent satellite imagery into latent space.
Created on January 24|Last edited on March 9
Comment

Visualizing encoded images
We encode our satellite images into latent space using Stable Diffusion VAE. Then we visualize the latents with a wandb.Table. Finally, we decode the latents back to image space, and surprisingly, we get back an almost lossless copy of the input.
This means that we can train latent-diffusion models using this technique, saving huges amount of compute. Also, the encoding can be done offline (before training the diffusion pipeline).
The code to encode and decode the images is below:
from diffusers import AutoencoderKLfrom PIL import Imagevae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae")def encode_img(input_img):# Single image -> single latent in a batch (so size 1, 4, 64, 64)if len(input_img.shape)<4:input_img = input_img.unsqueeze(0)with torch.no_grad():latent = vae.encode(input_img*2 - 1) # Note scalingreturn 0.18215 * latent.latent_dist.sample()def decode_img(latents):# bath of latents -> list of imageslatents = (1 / 0.18215) * latentswith torch.no_grad():image = vae.decode(latents).sampleimage = (image / 2 + 0.5).clamp(0, 1)image = image.detach()return image
Run set
2
Add a comment