Skip to main content

Make-A-Video: Meta AI's New Model For Text-To-Video Generation

A new text-to-video model, Make-A-Video, has been revealed by Meta AI.
Created on September 29|Last edited on September 29
Meta AI researchers today announced a new AI project they've been working on: Make-A-Video, a model for text-to-video generation, the natural next step for content-generative AI models. This release includes a


How Make-A-Video leveraged text-to-image priors and learned motion unsupervised

For creating text-to-image generation models, there are countless datasets full of text-image data pairs ready to download from the internet, however the same is not true for text-video data. On top of that, with so many high quality text-to-image models, it might be a waste of time and compute to train a text-to-video model from scratch.
With that in mind, the researchers who created Make-A-Video used an existing text-to-image model as the baseline and taught it how to understand motion separately through unlabeled video datasets. This lets the model have the full capability of a state-of-the-art text-to-image model for video generation without the need to have massive datasets of text-video data pairs.
The video data that the model is trained with is unlabeled, meaning that the motion portion of the model learned motion by simply observing how things moved in the real world, without any context.

Make-A-Video can do more than just turn text prompts into short videos; It can create videos from existing images, fill in content between two images, and create variations based on existing videos. Head to Make-A-Video's web page to see more examples of what it can do.
As of right now, Make-A-Video is not available for use by the public, however they seem to plan on gradually making it more accessible over time as they improve it. You can show your interest in using it in future releases by signing up here.

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.