Stability AI Launches Stable Video Diffusion for Generative Video Creation
Stability expands into video generation! 
Created on November 22|Last edited on November 22
Comment
Stability AI has announced the release of Stable Video Diffusion, a new foundation model for creating videos. This model builds upon the principles of their existing image model, Stable Diffusion, and marks a notable progression in generative AI capabilities.
The Stable Video Diffusion is currently in its research preview phase. To support further development and experimentation in the AI community, Stability AI has made the code available on GitHub, with the necessary weights for local operation accessible via their Hugging Face page. Detailed information on the technical aspects of the model is provided in a comprehensive research paper.
Diffusion
Stable Video Diffusion is a high-resolution latent video diffusion model designed for state-of-the-art text-to-video and image-to-video generation. This model is built upon latent diffusion models used in 2D image synthesis, modified to generate videos by inserting temporal layers and fine-tuning on high-quality video datasets.

Flexibility
The model's flexibility is one of its key features. It can be adapted for a variety of video applications, including multi-view synthesis from single images, a capability enhanced through fine-tuning on specific multi-view datasets. Stability AI is also planning to develop a range of models that build upon this foundational technology.

Easy Access
A particularly exciting development is the upcoming web experience, which will feature a Text-To-Video interface. This tool will demonstrate the practical applications of Stable Video Diffusion across several sectors, including advertising, education, and entertainment. Interested users can sign up for the waitlist to gain early access to this new interface.
Next-Level Performance
In terms of performance, Stable Video Diffusion boasts two image-to-video models capable of generating videos with 14 to 25 frames at frame rates ranging from 3 to 30 frames per second. Initial user preference studies suggest that these models are highly competitive, even outperforming some leading closed models.
Currently, Stable Video Diffusion is intended exclusively for research purposes. Stability AI emphasizes that the model is not yet ready for real-world or commercial applications. The company is actively seeking feedback on safety and quality aspects to refine the model for eventual wider release.
The paper: 
The article: 
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.