HuggingFace Now Supports Ultra Fast ControlNet
HuggingFace has launched support for ControlNet — imposing greater control (and speed) for the image synthesis process for diffusion models, such as Composer.
Created on March 6|Last edited on March 7
Comment
ControlNet, a model similar to Composer (though released before Composer), allows the user to exert more control over the image synthesis process.
What is ControlNet?
ControlNet is a neural network architecture that "controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions." The authors state ControlNet is scalable to any dataset size, preserves pretraining performance, is fast and end-to-end, and ultimately gives the user more control over synthesizing images. The architecture is visualized below.
ControlNet makes 2 copies of an existing, trained large image diffusion model (like Stable Diffusion): locked copy and trainable copy. The locked copy has frozen weights and the trainable copy is trained on a specific dataset of your choice.

The idea is that this ControlNet architecture/network copy wraps around the neural network blocks, tuned to a specific dataset. It's kind of like a ResNet block but with a lot more layers! They employ a zero convolution which is a normal convolutional layer but with weights initialized to 0.
The vector c is an external conditional vector based on what dataset you pick. This fast-tracked fine-tuning framework is task specific, meaning it tunes a model to a particular task (e.g. pose transfer, style transfer, etc).
They tested different image-based conditions:
- Canny edge detection
- Hough line
- HED Boundary
- User sketches
- Human Pose
- Semantic Segmentation
- Depth maps
- Normal image-caption pairs
- Cartoon line drawings
Here's one of many results they show. This one is based on user sketches.

HuggingFace and ControlNet
HuggingFace recently integrated ControlNet with Stable Diffusion into their Diffusers library. It's exactly as ControlNet is defined in the paper, but they do employ a few tricks to speed up computation: only 4 GB of VRAM and a few seconds of compute on a V100 to generate results. Check out the blog for more!
References
Paul, Sayak, et al. “Ultra fast ControlNet with🧨 Diffusers.” ControlNet in 🧨 Diffusers, Hugging Face, 3 Mar. 2023.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.