A Guide to Prompt Engineering for Stable Diffusion

A comprehensive guide to prompt engineering for generating images using Stable Diffusion, HuggingFace Diffusers and Weights & Biases.
Soumik Rakshit
Created on October 24|Last edited on December 7
Comment
The popularity of text-conditional image generation models like DALL·E 3, Midjourney, and Stable Diffusion can largely be attributed to their ease of use for producing stunning images by simply using meaningful text-based prompts. Despite the ease of use, however, these are machine learning models with questionable "intelligence," and so it's quite natural that these diffusion models have a tough time understanding exactly what we mean by our prompts.
The best way to use these models is to treat them like a fancy paintbrush rather than magical spells from Hogwarts that can read your mind and create your reality.
﻿
In this report, we'll explore:
How we can establish simple image generation workflows using the 🧨 Diffusers library by 🤗 HuggingFace. We will primarily be sticking to the Stable Diffusion family of models for the scope of this report. 
Managing and visualizing our image generation experiments with Weights & Biases.
The most basic "prompt engineering" technique to dramatically improve the quality of your images.
How to properly structure a great prompt for diffusion models.
How to further elevate our generated images using negative prompts.
Table of Contents (click to expand)
A Sneak Peek At The Code & ResultsIn spite of naming our pursuit "prompt engineering", the techniques involved are more artistic in nature than scientific. The process of generating images using diffusion models is a craft on its own; a craft that can be mastered in a matter of minutes, unlike traditional artistic endeavors that often take years to master.
💡
As a note, you can run the code in this report via this Colab Notebook: 
﻿
﻿
﻿
Alternatively, jump on this HuggingFace Space to start crafting your prompts using an interactive application 👇
﻿
﻿
And, since this is a GenAI report, we know what you want upfront: some stunning images to get you started. We got you.
﻿
﻿
Some stunning images generated using Stable Diffusion XL. Keep reading to learn more about how you can bring your own ideas to life.4
﻿
This article has been written as a Weights & Biases Report which is a project management and collaboration tool for machine learning projects. Reports let you organize and embed visualizations, describe your findings, share updates with collaborators, and more. To know more about reports, check out Collaborative Reports.
💡
﻿
💻 The Image Generation WorkflowIn this section, we will set up a basic workflow to generate images using Stable Diffusion XL 1.0 with the 🧨 Diffusers library by Huggingface. We will also show how we can use the Weights & Biases integration for Diffusers to automatically keep track of all aspects of our image generation experiments, including prompts, configs, and images.
# Install all the dependencies
!pip install diffusers accelerate transformers wandb
﻿
# Imports
import torch
from diffusers import DiffusionPipeline
from wandb.integration.diffusers import autolog
﻿
# Define the Stable Diffusion XL pipeline
pipeline = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
)
﻿
# Set the device for the pipeline
pipeline = pipeline.to("cuda")
# uncomment the following line if you encounter OOM issues
# pipeline.enable_sequential_cpu_offload()
﻿
# Set the prompt
prompt = "children dressed like a gangster"
# Set the negative prompt, or leave it `None` if you don't want to use it
negative_prompt = None
﻿
# Setting a seed would ensure reproducibility of the experiment.
generator = torch.Generator(device="cuda").manual_seed(42)
﻿
# Call WandB Autolog for Diffusers
autolog(init=dict(project="diffusers-prompt-engineering"))
﻿
# Add the callback to the pipeline, and execute the pipeline.
image = pipeline(
    prompt,
    negative_prompt=negative_prompt,
    generator=generator,
    height=1024,
    width=1024,
)
﻿
Want to run the code right away and generate your own images using Stable Diffusion? Run the following colab notebook 👇
﻿
﻿
Alternatively, jump on this HuggingFace Space to start crafting your prompts using an interactive application 👇
﻿
﻿
😈 The Devil's in the DetailsThe first and most simple fact that we must keep in mind while generating images using diffusion models is that we need to communicate our idea as clearly as possible to the model. This calls for the use of communicating our idea in the form of highly descriptive prompts that describe the subject and scene in detail to help the model generate accurate images.
Let us see the example of such a prompt...
﻿
An example showcasing how the addition of simple details to a barebone prompt idea elevated the quality of the generated image.2
﻿
﻿
Note that although the prompt needs to be descriptive in nature, it does not need to be a grammatically accurate, only coherent enough for the model to understand the idea and render it.
💡
﻿
🫀 The Anatomy of a Good PromptGiven that we have already established that the craft of prompt engineering is more of an artistic pursuit, a definite prompt format that would guarantee good results is difficult to settle upon. However, suppose we notice the aforementioned descriptive prompt that we used to generate a beautiful Stable Diffusion image. In that case, we will notice that the key phrases that were added as details to the base idea are some very specific details:
The Subject is the core idea behind the image. In the aforementioned demonstration, we used the subject "children dressed like a gangster".
The Medium that was used to deliver the artwork. Medium has a strong effect because one keyword alone can dramatically change the style. Some examples can be illustration, oil painting, 3d rendering, photography, cinematic shot, etc. In the aforementioned demonstration, we use the medium "photograph".
The artistic style that you want to capture, for example, impressionist, surrealist, gothic, art deco, etc. In the aforementioned demonstration, we used the style "hyper realistic". Note that we can also mention multiple styles in the prompts and experiment with mixing and matching different styles.
The Names of Artists whose unique aesthetic styles you might want to mimic. For example, you can add phrases like "in the style of Van Gogh", "in the style of Akira Kurosawa", "Studio Ghibli" or even "Abanindranath Tagore". You can also pass names of famous modern artists with unique styles from sites like Artstation, DeviantArt, etc.
Lighting and Color Palette are important aspects of any artwork that can be specified using phrases like surrealist lighting, dark lighting, muted color palette, vibrant, saturated, etc.
Additional Details such as names of cameras, names of rendering techniques, rendering engines, and resolution can also be included depending on the vision of the artist.
﻿
All the categories of key-phases mentioned in this section are loose definitions and not strict categories. Some category of phrases Subject, Medium and Artistic Style are the most important structures of a prompt, that should be included by default; others are more or less optional and depends upon the vision of the user.
💡
﻿
Taking this simple insight, let us try to generate a few more images using Stable Diffusion XL 1.0...
﻿
���
Some more examples of descriptive prompts dramatically improving the quality of generated images20
﻿
😉 A little Negativity goes a Long WayA negative prompt can be crucial to strong prompt engineering and provides a way for us to specify what we don't want to see in the generated image without any extra input. In models, such as Stable Diffusion XL though, negative prompts may not be as crucial as prompts; they can certainly help prevent the generation of strange images. Mostly, we would use a negative prompt as a string of keywords representing unwanted features that we would want to remove from our generated images.
Let's try to demonstrate negative prompts using a simple example...
﻿
A simple example of negative prompts.2
﻿
It is always advisable to use a generic set of negative prompts in your experiments because they not only provide more context to the diffusion model to generate the image but also help eliminate basic flaws in the images, such as deformed fingers,  eyes, limbs, and other anatomical features, avoid monochrome images or dull color palettes.
The following panel demonstrates how using a basic set of negative prompts significantly improves the quality of the generated images 👇
﻿
﻿
Some examples of a generic set of negative prompts improving the quality of the generated images and fixing flaws.8
﻿
Not just that, a good set of negative prompts can sometimes eliminate the burden of writing descriptive prompts, as demonstrated by the example below 👇
﻿
A good set of negative prompts can often reduce the burden of highly descriptive prompts.2
﻿
I recommend checking out this thread on HuggingFace for more ideas on negative prompts.
💡
🎨 Iterative Refinement of Diffusion Models with Prompt EngineeringMany artists propose iterative drawing or iterative refinement as a great way to improve one's artistic craft. It is simply the technique of refining the artwork over multiple iterations manually. Over each iteration, one might add details to the prompt or keywords to the negative prompt depending on the image generated as part of the previous iteration.
A specific mode of iterative refinement is also referred to as beg crafting by HuggingFace user bji in this thread. In this process, the user is essentially begging the model for what they want to see while in fact all they're really doing is introducing some random factors into how it denoises. The effects of prompt changes are statistical; therefore the user has to generate many results from a prompt to be able to tell if it had a statistically significant effect.
💡
The iterative refinement process is even more relevant for AI-generated art than traditional art, given the finicky nature of the machine learning models. One could try out multiple iterations to find the perfect set of prompts, negative prompts, seeds, or other settings in the diffusion pipeline to achieve their vision.
In order to learn more about the application of iterative refinement in a traditional artistic medium, check out this video. We also recommend checking out this example of the application of iterative refinement using MidJourney by Jeff Barry.
💡
Here's a demonstration of how we can apply iterative refinement to steadily improve our prompt over multiple iterations to finally reach our vision using Stable Diffusion 2.1.
﻿
An example of refining the artwork over multiple iterations of prompts, negative prompts and even resolution and aspect ratio, to generate the perfect image that aligns with the requirement or vision of the artist10
﻿
🙏 AcknowledgementsThis report wouldn't have been possible without the stream of constant advice and support from Sayak Paul.
Thanks to Arindam Das and Atanu Sarkar whose prompting experiments helped me a lot with this report.
I was hugely inspired to write this report after taking the Crash course in generative AI: Stable Diffusion, DALL·E & Midjourney by PromptHero. Adapting what I learned from this course in terms of MidJourney and A1111 to Diffusers is turning out to be a fun exercise.
The works of PromtHero community members FAngel, Atheist, and ArtEnthusiast inspired me a lot in crafting the prompts showcased in this report.
The article Stable Diffusion prompt: a definitive guide, served as an excellent guide in subject-driven prompt crafting, which is the prompting technique that is discussed in this report.
🏁 ConclusionIn this report, we discuss the art and science of prompt engineering for diffusion models primarily focussing on the Stable Diffusion family of models.
First, we establish a simple image generation workflow using the 🧨 Diffusers library by 🤗 HuggingFace. We also demonstrate the use of the Weights & Biases integration for Diffusers to automatically keep track of all aspects of our image generation experiments, including prompts, configs, and images.
We explore the most basic prompt engineering technique of stringing together descriptive phrases to a base prompt idea to dramatically improve the quality of our images.
We then explored what elements and phrases constitute the anatomy of a good prompt.
We then explored the use of negative prompts to both eliminate unwanted features from our images and also to improve their overall quality.
Finally, we explored the process of iterative refinement applied to the process of generating images using machine learning models.
﻿
A Guide to Using Stable Diffusion XL with HuggingFace Diffusers and W&B
A comprehensive guide to using Stable Diffusion XL (SDXL) for generating high-quality images using HuggingFace Diffusers and managing experiments with Weights & Biases
DeepFloydAI: A New Breakthrough in Text-Guided Image Generation
In this article, we explore DeepFloydAI — an AI Research Band which is working with StabilityAI to make AI open again. 
Improving Generative Images with Instructions: Prompt-to-Prompt Image Editing with Cross Attention Control
A primer on text-driven image editing for large-scale text-based image synthesis models like Stable Diffusion & Imagen
Making My Kid a Jedi Master With Stable Diffusion and Dreambooth
In this article, we'll explore how to teach and fine-tune Stable Diffusion to transform my son into his favorite Star Wars character using Dreambooth.
﻿
﻿
Add a comment
Tags: Articles, Diffusion, Stable Diffusion, LLM
Iterate on AI agents and models faster. Try Weights & Biases today.