Skip to main content

Composer: Diffusion-based Image Synthesis with Composable Conditions

Created on February 26|Last edited on March 5

Composer is a conditional diffusion model that allows for greater control over image synthesis. The integral idea is dismantling an image into its respective components, or attributes like caption, color, sketch, etc. The model can learn to generate an image from these various representations.
The inventors also describe Composer as a general framework for a variety of generative tasks.
The decomposition includes:
  • Caption: used image-text training data for captions
  • Semantics and style: used CLIP image embeddings
  • Color: color statistics from image from CIELab histogram
  • Sketch: used an edge detection model
  • Instances: YOLOv5 to detect instances in an image
  • Depthmap: depthmap estimation model
  • Intensity: introduce grayscale images to Composer to help it learn color intensity
  • Masking: introduce masking to allow inpainting
The model is capable of a wide variety of tasks:
  • Creating variations of an image by varying a certain aspect or representation
  • Interpolating between 2 images to create a blend
  • Reconfiguring directly 1 aspect of the image
  • Masking out a certain region to restrict where the model can edit the image
  • Colorizing an image based on a color palette
  • Style Transfer!
  • Pose transfer
  • Virtual try-on 😂: masking the clothes of a person in 1 image and replacing it with a garment from another image
Image synthesis and generative AI are extremely popular right now. I remember when StyleGAN{1, 2, 3} came out! The images were unreal. Now a few years down the line, with the rise of diffusion models, we have models that not only generate images, but can be controlled to this degree!
It might not be long till these models get an upgrade and start generating videos (like Imagen Video but with the degree of user control that Composer has if not more!).

References

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.