Stable Diffusion: A Model To Rival DALL·E 2 With Fewer Restrictions
A new text-to-image generation model is coming out, this time without nearly as many restrictions. Stable Diffusion rivals DALL·E 2 and other large-name image generation models while promising much more freedom in what can be generated.
Created on August 12|Last edited on August 16
Comment
AI Engineers at Stability AI have been hard at work developing a new text-to-image generation model called Stable Diffusion. This model rivals the current state-of-the-art models like DALL·E 2 and Imagen, while maintaining the promise to be unrestricted in what can be generated.
Many users of the DALL·E 2 beta have had complaints about the keyword-level prompt filters set in place by OpenAI, as well as the model tweaks implemented to avoid generating likenesses of real people. Stable Diffusion comes without many of those restrictions, letting users have access to the power of large-parameter text-to-image generation models with the freedom of currently available smaller ones like Craiyon.

The Controversy of unrestricted AI image generation
Of course, while some people will be in favor of unrestricted image generation like this, others may be of the opinion that it does more harm than good. While there are still a few restrictions regarding problematic imagery outlined in the terms of service, Stable Diffusion has been used to generate images of nude models, current-day military conflict, and political or religious figures in incongruent situations.

The recent controversy of GTP-4chan comes to mind, and we all know the insanity that's come from Craiyon over its lifetime. Both sides have good reason to be for or against AI models like this.
How does Stable Diffusion work?
Stable Diffusion is a diffusion model, which is a type of image generation model that gradually builds a coherent image from a noise vector by gradually modifying it over a number of steps.
The model was trained using the LAION Aesthetics dataset, a subset of the LAION 5B dataset, containing 120 million image-text pairs from the complete set which contains nearly 6 billion image-text pairs. LAION datasets are developed to be freely accessible for the promotion of a democratized AI development environment.
Stable Diffusion reportedly runs on less than 10 GB of VRAM at inference time, generating 512x512 images in just a few seconds, meaning running on consumer GPUs is an option.

How to get access to Stable Diffusion?
Stable Diffusion's public release should happen soon however, right now you can sign up for beta access or sign up for access as a research entity. Once you have access, you'll be able to download model weights from Hugging Face.
Additionally, here's an interesting interview with Emad Mostaque, founder of Stability AI, by Yannic Kilcher on the pupurpose behind the initiative, how he responds to concerns about its use, and more.
Find out more
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.