Skip to main content

Google's Imagen: An Answer To OpenAI's DALL·E 2 For Text-To-Image Generation

Google has revealed a text-to-image generation model to rival that of DALL·E 2, generating photorealistic images with remarkable consistency, variety, and accuracy.
Created on May 24|Last edited on May 24
It was only last month that OpenAI released DALL·E 2, a groundbreaking improvement in AI-powered text-to-image generation. Now, only a month and a half after DALL·E 2's announcement, Google's stepped up to the plate of text-to-image generation with their new project Imagen.
Imagen is the next step in text-to-image generation model architecture, generating highly photo-realistic images with a consistency similar to that of DALL·E 2.


How does Imagen work?

The key difference about Imagen from other text-to-image models is the focus on the size of the language processing portion of the model. The Google researchers responsible for the development of Imagen found that by increasing the size of the language model within Imagen, generated images would match prompts more accurately and even produce higher-fidelity images.
The whole process for creating the images is split into a few steps:
  • First, the language processing model (pre-trained on text-only data) will break the text input down into tokens more easily interpreted by the following layers.
  • Next, the actual image generation model takes the tokenized prompt and generates a 64x64 image.
  • Following that, both the 64x64 image data and the tokenized prompt text are fed through two super-resolution models to upscale the images to 256x256 then 1024x1024. Because the text prompt is also fed into them, the upscaling models are given the context they need to make sure their output matches the original prompt.
  • Finally, the generated image matching our text prompt is complete.

A good breakdown of Imagen's performance can be found in this twitter thread breaking down the research paper here:


Making images with Imagen

Unfortunately, like many fun looking state-of-the-art machine learning models, we're not allowed to play with it. There's no statement on if we'll ever get to, or even a portal for researchers to gain access to it, but we can hope.
For now, we can still appreciate the beauty and chaos of the sample images provided.

(More can be found on the project site and in the research paper.)

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.