HuggingFace Now Supports Ultra Fast ControlNet

HuggingFace has launched support for ControlNet — imposing greater control (and speed) for the image synthesis process for diffusion models, such as Composer.

Vincent Tu

Created on March 6|Last edited on March 7

Comment

ControlNet, a model similar to Composer (though released before Composer), allows the user to exert more control over the image synthesis process. 
What is ControlNet?﻿ControlNet is a neural network architecture that "controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions." The authors state ControlNet is scalable to any dataset size, preserves pretraining performance, is fast and end-to-end, and ultimately gives the user more control over synthesizing images. The architecture is visualized below. 
ControlNet makes 2 copies of an existing, trained large image diffusion model (like Stable Diffusion): locked copy and trainable copy. The locked copy has frozen weights and the trainable copy is trained on a specific dataset of your choice. 
﻿
﻿
The idea is that this ControlNet architecture/network copy wraps around the neural network blocks, tuned to a specific dataset. It's kind of like a ResNet block but with a lot more layers! They employ a zero convolution which is a normal convolutional layer but with weights initialized to 0. 
The vector c is an external conditional vector based on what dataset you pick. This fast-tracked fine-tuning framework is task specific, meaning it tunes a model to a particular task (e.g. pose transfer, style transfer, etc). 
They tested different image-based conditions:
Canny edge detection
Hough line
HED Boundary
User sketches
Human Pose
Semantic Segmentation
Depth maps
Normal image-caption pairs
Cartoon line drawings
Here's one of many results they show. This one is based on user sketches. 
﻿
HuggingFace and ControlNet﻿HuggingFace recently integrated ControlNet with Stable Diffusion into their Diffusers library. It's exactly as ControlNet is defined in the paper, but they do employ a few tricks to speed up computation: only 4 GB of VRAM and a few seconds of compute on a V100 to generate results. Check out the blog for more!
ReferencesPaul, Sayak, et al. “Ultra fast ControlNet with🧨 Diffusers.” ControlNet in 🧨 Diffusers, Hugging Face, 3 Mar. 2023.
﻿https://arxiv.org/pdf/2302.05543.pdf﻿
﻿

Add a comment

Tags: ML News

Iterate on AI agents and models faster. Try Weights & Biases today.