Skip to main content

Training Journal: DreamBooth Torta Fine-tuning

Created on December 23|Last edited on January 5
SummaryTasksCreate data of images of tortasCreate a dataset of tortasFine-tune Stable DiffusionW&B Tables logging codeStable Diffusion v1-4 Dec 23, 17:50 - 1 row loggedDec 23, 18:08 - samples logged, too much "green field" generatedDec 23, 18:20 - Modified prompt, better!Dec 23, 18:35 - Higher guidance scale testingDec 23, 21:25 - Using more diverse promptsDec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good Dec 31, 11:55 - Re-trained 800 steps - too much torta? Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the conceptDec 31, 12:35 - Trained Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training tooJan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvementJan 1, 11:50 - Try add "sandwich" class to the training prompt - success!Stable Diffusion v1-5Jan 1, 16:20 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfittedJan 1, 17:05 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfittingJan 1, 17:20 - Trained Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lrJan 1, 17:40 - Trained Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weakJan 1, 18:30 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortasJan 1, 18:50 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!Jan 1, 19:15 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again - Jan 1, 19:45 - Trained "torta sandwich" using LastBen's, w/text encoder -


My training journal for the Hugging Face DreamBooth fine-tuning hackathon. Here I'll be fine-tuning on images of tortas, a delicious variety of sandwich from Mexico!

Summary

  • Adding a "class of thing" to the training and inference prompts unlocked the best results with SD v1-4, e.g. training on "torta sandwich" instead of just "torta"
  • Training at higher learning rates, like in LastBens colab, works for unique concept names like "Ae68tVcVmwwQ", but is also easier to overfit
  • Tried:
    • more steps, didn't work, overfitted
    • more unique prompts such as "zztortazz" and "Ae68tVcVmwwQ" didn't work so well, possibly the text encoder needed to be trained too
    • the diffusers library train script, train_dreambooth.py, with similar setting, no big difference

Tasks

  • Fine-tune Stable Diffusion 1.4 on images of tortas
  • Explore guidance_scale settings on 1 prompt
  • Explore guidance_scale settings on 5 more diverse prompts
  • Fine-tune Stable Diffusion 2.1 on images of tortas

Create data of images of tortas

Create a dataset of tortas

Upload a folder of screenshots of tortas from google review images from tortas stands in Mexico City.
Tasty Tortas

Run set
1

Upload to HF Hub for use with the datasets library
from datasets import load_dataset
dataset = load_dataset("imagefolder", data_dir="data/tortas")
dataset.push_to_hub("tortas")

Fine-tune Stable Diffusion

Fine-tune using the excellent DreamBooth hackathon colab provided:
Colab here

W&B Tables logging code

# start a wandb run
import wandb
import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
"morgan/torta",
torch_dtype=torch.float16,
).to("cuda")

num_cols = 6
name_of_your_concept = "torta"
prompt = f"a photo of a green field full of {name_of_your_concept}s"

img_ls = []
for i in range(num_cols):
img_ls.append(f"im_{i}")

cols = ['concept', 'prompt', 'guidance_scale']
cols.extend(img_ls)

# start a wandb run and create a Table
wandb.init(entity='morgan', project='hf-dreambooth', config=vars(args))
tbl = wandb.Table(columns=cols)

for guidance_scale in range(8, 12):

all_images = []
for _ in range(num_cols):
images = pipe(prompt, guidance_scale=guidance_scale).images
all_images.extend(images)

# add a row of data to the wandb table
tbl.add_data(name_of_your_concept, prompt, guidance_scale, wandb.Image(all_images[0]), wandb.Image(all_images[1]),
wandb.Image(all_images[2]), wandb.Image(all_images[3]), wandb.Image(all_images[4]), wandb.Image(all_images[5]))

# log the wandb data
wandb.log({"tortas-table":tbl})
wandb.finish()

Stable Diffusion v1-4

Dec 23, 17:50 - 1 row logged

Dec 23, 18:08 - samples logged, too much "green field" generated

Dec 23, 18:20 - Modified prompt, better!

New prompt: "lots of tortas in a green field" - much better as there are more samples with tortas.
However from guidance_scale from 7 - 12 there are still sometimes some samples without any tortas.
Also, some generated tortas seem to have more green fillings like maybe lettuce/spinanch/cucumber/avocade than tortas typically have - I count 3 out of 37 training images with bits of green. The model might be drawing too many parallers between tortas and sandwiches and/or hamburgers
Ideas:
  • Try more sampling with higher guidance_scale
  • Train for longer

Run set
1


Dec 23, 18:35 - Higher guidance scale testing

Dec 23, 21:25 - Using more diverse prompts

Testing with 5 different tortas prompt
"a photo of lots of tortas in a green field" "a sketch drawing of a torta floating through space" "a torta being eaten by a sheep, by Claude Monet" "a torta watching tv, pixar animation, artstation, realistic, 3d, 8k, unreal engine" "a happy torta on a postage stamp"
Mixed results here
Idea: I wonder if training with a more unique concept name will yield better results, will train with zztortazz as the concept name.

Run set
1


Dec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good

Fine-tuned SD with with "zztortazz" as the concept instead of "torta"
Looks like its worse than "torta", I guess maybe the model already has a reference for what tortas are

Run set
1


Dec 31, 11:55 - Re-trained 800 steps - too much torta?

Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the concept

Dec 31, 12:35 - Trained Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training too

Trained with a concept called Ae68tVcVmwwQ instead, to see if a more unique concept might help
  • Completely lost the concept here, I guess the new token needs to be trained into the text encoder

Run set
1


Jan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvement

Jan 1, 11:50 - Try add "sandwich" class to the training prompt - success!

  • Adding "sandwich" as a class of thing to the training prompt ("a photo of a torta sandwich") and the inference prompt really improved the results!

Run set
1


Stable Diffusion v1-5

Jan 1, 16:20 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfitted

  • Used colab defaults with captions of "a photo of a Ae68tVcVmwwQ sandwich", i.e. 650 UNet steps and UNet lr of 2e-5 (diffusers colab is 400/2e-6)
  • Looks to be a little overfitted, it was trained with 650 UNet steps at 2e-5, no text encoder training
  • Running inference again at guidance == 7 (down from 11) looks a little better, but still very overfitted
  • Try:
    • Train again with 400 UNet steps

Run set
2


Jan 1, 17:05 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfitting

Jan 1, 17:20 - Trained Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lr

Jan 1, 17:40 - Trained Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weak

  • Trained with 400 steps and 2e-6, text encoder 150 and 1e-6
  • Looks like it is still undertrained, will try with more steps for the UNet and Text Encoder

Run set
1


Jan 1, 18:30 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortas

Jan 1, 18:50 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!


Run set
1


Jan 1, 19:15 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again -


Run set
1



Jan 1, 19:45 - Trained "torta sandwich" using LastBen's, w/text encoder -