Skip to main content

Midjourney Launches Its First Video Model 'V1'

Created on June 19|Last edited on June 19
Midjourney has laid out a broader vision that stretches far beyond static image generation. The team is aiming toward a future of real-time, open-world AI simulations. In this vision, AI doesn’t just create images—it dynamically generates scenes where characters move, environments evolve, and users can interact freely in 3D space. That kind of system demands a stack of separate, highly capable models: for visuals, for animation, for spatial navigation, and for delivering it all in real-time. The release of a video model marks the second step in assembling this full stack.

Introducing the V1 Video Model

The company’s new V1 Video Model is officially available to the community. While technically still a foundational step, it introduces a fully usable product. Midjourney has focused on making this release fun, intuitive, visually rich, and relatively affordable. The idea is to strike a balance between creativity and accessibility. With this release, users now have the ability to bring images to life through animation, making this the biggest shift in Midjourney’s product since its initial launch.

Image-to-Video Workflow and Motion Options

The new toolset follows an “Image-to-Video” workflow. Users begin by generating a typical Midjourney image. From there, they can click “Animate” to transform the still image into a motion sequence. For those looking for quick results, the “automatic” mode generates motion prompts on its own and applies them. For more control, a “manual” setting allows users to write their own motion instructions to guide how the scene evolves. There are also two motion levels: “low motion” is intended for more subtle, ambient movement, while “high motion” ramps up the activity across both subject and camera, though at the risk of introducing visual artifacts.

Extending Videos and Uploading Custom Images

Users can stretch a single video out further by using the “extend” feature, which adds about four seconds of footage per click, up to a total of four extensions. A key part of the update is that this system also supports outside images. Any external image can be dropped into the prompt bar, designated as a “start frame,” and animated with a custom motion prompt. This opens the door to reanimating photos, illustrations, or previous Midjourney generations, bringing both old and new content into motion.

Early Access, Pricing, and Limitations

The video model is launching first on the web and is being priced at around eight times the cost of a standard image job. However, since one job creates four 5-second clips, the actual price-per-second ends up comparable to an image upscale. The team notes that this represents a drastic drop in cost—over 25 times cheaper than earlier video generation tools from other companies. For now, Pro subscribers and above will have access to a relaxed queue mode for video generation. That said, the system is still resource-intensive and the company is watching usage closely to avoid server overload and to tune pricing as needed.

Looking Ahead

The rollout of this video model isn’t just about the product itself—it’s part of a larger roadmap. Midjourney is trying to modularize the development of real-time simulation: visuals, animation, spatial navigation, and performance optimization. Over time, these modules will be unified. What begins as a fun animation tool could eventually evolve into a real-time, navigable AI environment. For now, the emphasis is on exploration, experimentation, and feedback. According to Midjourney, this model isn’t the endgame, but an important next step—and a sign of what’s coming.
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.