Making My Kid a Jedi Master With Stable Diffusion and Dreambooth

In this article, we'll explore how to teach and fine-tune Stable Diffusion to transform my son into his favorite Star Wars character using Dreambooth.

Thomas Capelle

Created on October 18|Last edited on December 19

Comment

There are two things you should know about this project: 
1) You're likely familiar with Stable Diffusion by now. It's a new deep-learning model that generates images from a prompt. 
2) This week was my oldest son's birthday. I thought it would be fun to give him the gift of a promotion to Jedi Master. 
"Portrait of heccap16 as a young Jedi master"
But of course, using the prompt "my kid as a Jedi Master" isn't going to work. Stable Diffusion knows celebrities and athletes and politicians, but it doesn't know my son. Not until I can teach it what he looks like, that is. 
That is exactly what we are going to do in this article — teach Stable Diffusion how to turn my son into his favorite Star Wars character using Dreambooth. Here's what we'll cover: 
Table of ContentsFine-Tuning Stable DiffusionTeaching the Model Who Hector IsFine-Tuning the Text Encoder Does Make a DifferenceMore
﻿
 
Let's use The Force! 
Fine-Tuning Stable DiffusionSo how do we do this? A new technique that emerged a couple of weeks ago is Dreambooth, which consists of teaching the model a new word with the corresponding images. 
Basically, you pass a bunch of different images of someone (or something) to the model, and it will learn to depict that person or thing. To make it easier for the model to understand this new "concept" you also give the model the class of this concept. In the case of this article case, my son.
﻿
I used the HuggingFace diffusers library and default example training script. I asked for help on the fastai discord and got really good feedback. I also found some insights on the Stable Diffusion Dreambooth discord channel.
I instrumented the training with a simple callback similar to what my colleague Scott shared here in this Twitter thread: ﻿﻿
﻿
Then, just run inference, and store your model predictions on a wandb.Table. In the latest version of diffusers, you could even set up a callback and log your training metrics (even if has almost no meaning for this fine-tuning task).
My Code and GPUI used a 16GB GPU for inference and a 40GB GPU for training.
def run_prompt(prompt, bs=4):
    config.prompt = prompt
    images = []
    table = wandb.Table(columns=["prompt", "image"])
    with wandb.init(project="hector", config=config):
        for _ in range(bs//4):
            images += pipe([prompt]*bs, num_inference_steps=config.num_inference_steps, 
                          guidance_scale=config.guidance_scale).images
﻿
        for img in images:
            table.add_data(prompt, wandb.Image(img))
        
        wandb.log({"predictions":table})
Teaching the Model Who Hector IsWe need to teach the model who is Hector (my son), but the model already knows a bunch of (lesser) Hectors. So we're going to call my son something more "unique", I ended up using heccap16 as the token, and it appears to have worked. I also tried different classes for this concept–kid, boy, person, small person–but boy appears to work the best. 
Does it work? Kind of. It's not as easy as it sounds, and there is a lot of tweaking.
First, you need a varied dataset (not necessarily huge!) with 20 to 30 pictures on different backgrounds with good lighting and exposure.
You need to train enough, but we will see this in more detail.
Lastly, you need to construct the prompt correctly. If you taught the model this new concept, it works better to pass the class on the concept explicitly. 
This other report has a ton of info on setting up this training procedure correctly!
💡
You need to generate a lot of images to check how the model is performing:
﻿
﻿
Fine-Tuning the Text Encoder Does Make a DifferenceYou can now pass the flag --train_text_encoder and it will train the CLIP model along the UNET, and it makes a good difference in the quality of model predictions!
﻿
﻿
MoreI am still experimenting with this, and it is a very fast-moving field. I recommend you to follow the fastai part 2 course that Jeremy is teaching right now, he is actually covering the very tip of the state of the art on diffusion models.
﻿
﻿

Add a comment

Ryan • 3 years ago

LOL, you write an entire tutorial and don't really explain half of your actual process. Learning rate? Number of steps? Not super helpful

1 reply

Tags: Stable Diffusion, Articles, Tutorial, GenAI, Computer Vision, Tables, Intermediate, Experiment, HuggingFace

Iterate on AI agents and models faster. Try Weights & Biases today.