Training Journal: DreamBooth Torta Fine-tuning

Created on December 23|Last edited on January 5
Comment
﻿
SummaryTasksCreate data of images of tortasCreate a dataset of tortasFine-tune Stable DiffusionW&B Tables logging codeStable Diffusion v1-4 Dec 23, 17:50 - 1 row loggedDec 23, 18:08 - samples logged, too much "green field" generatedDec 23, 18:20 - Modified prompt, better!Dec 23, 18:35 - Higher guidance scale testingDec 23, 21:25 - Using more diverse promptsDec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good Dec 31, 11:55 - Re-trained 800 steps - too much torta? Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the conceptDec 31, 12:35 - Trained  Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training tooJan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvementJan 1, 11:50 - Try add "sandwich" class to the training prompt - success!Stable Diffusion v1-5Jan 1, 16:20 - Trained  Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfittedJan 1, 17:05 - Trained  Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfittingJan 1, 17:20 - Trained  Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lrJan 1, 17:40 - Trained  Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weakJan 1, 18:30 - Trained  Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortasJan 1, 18:50 - Trained  Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!Jan 1, 19:15 - Trained  Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again - Jan 1, 19:45 - Trained  "torta sandwich" using LastBen's, w/text encoder - 
﻿
﻿
My training journal for the Hugging Face DreamBooth fine-tuning hackathon. Here I'll be fine-tuning on images of tortas, a delicious variety of sandwich from Mexico!
SummaryAdding a "class of thing" to the training and inference prompts unlocked the best results with SD v1-4, e.g. training on "torta sandwich" instead of just "torta"
Training at higher learning rates, like in LastBens colab, works for unique concept names like "Ae68tVcVmwwQ", but is also easier to overfit
Tried:
more steps, didn't work, overfitted
more unique prompts such as "zztortazz" and "Ae68tVcVmwwQ" didn't work so well, possibly the text encoder needed to be trained too
the diffusers library train script, train_dreambooth.py, with similar setting, no big difference
TasksFine-tune Stable Diffusion 1.4 on images of tortas
Explore guidance_scale settings on 1 prompt
Explore guidance_scale settings on 5 more diverse prompts
Fine-tune Stable Diffusion 2.1 on images of tortas
Create data of images of tortas
Create a dataset of tortasUpload a folder of screenshots of tortas from google review images from tortas stands in Mexico City.
Tasty Tortas
﻿
Run set1
﻿
Upload to HF Hub for use with the datasets library
from datasets import load_dataset
dataset = load_dataset("imagefolder", data_dir="data/tortas")
dataset.push_to_hub("tortas")
Fine-tune Stable DiffusionFine-tune using the excellent DreamBooth hackathon colab provided:
﻿Colab here﻿
W&B Tables logging code# start a wandb run
import wandb
import torch
from diffusers import StableDiffusionPipeline
﻿
pipe = StableDiffusionPipeline.from_pretrained(
    "morgan/torta",
    torch_dtype=torch.float16,
).to("cuda")
﻿
num_cols = 6
name_of_your_concept = "torta"
prompt = f"a photo of a green field full of {name_of_your_concept}s"
﻿
img_ls = []
for i in range(num_cols):
  img_ls.append(f"im_{i}")
﻿
cols = ['concept', 'prompt', 'guidance_scale']
cols.extend(img_ls)
﻿
# start a wandb run and create a Table 
wandb.init(entity='morgan', project='hf-dreambooth', config=vars(args))
tbl = wandb.Table(columns=cols)
﻿
for guidance_scale in range(8, 12):
﻿
  all_images = []
  for _ in range(num_cols):
      images = pipe(prompt, guidance_scale=guidance_scale).images
      all_images.extend(images)
﻿
  # add a row of data to the wandb table
  tbl.add_data(name_of_your_concept, prompt, guidance_scale, wandb.Image(all_images[0]), wandb.Image(all_images[1]), 
              wandb.Image(all_images[2]), wandb.Image(all_images[3]), wandb.Image(all_images[4]), wandb.Image(all_images[5]))
﻿
# log the wandb data
wandb.log({"tortas-table":tbl})
wandb.finish()
Stable Diffusion v1-4 
Dec 23, 17:50 - 1 row logged
Dec 23, 18:08 - samples logged, too much "green field" generated
Dec 23, 18:20 - Modified prompt, better!New prompt: "lots of tortas in a green field" - much better as there are more samples with tortas.
However from guidance_scale from 7 - 12 there are still sometimes some samples without any tortas. 
Also, some generated tortas seem to have more green fillings like maybe lettuce/spinanch/cucumber/avocade than tortas typically have - I count 3 out of 37 training images with bits of green. The model might be drawing too many parallers between tortas and sandwiches and/or hamburgers 
Ideas:
Try more sampling with higher guidance_scale
Train for longer
﻿
Run set1
﻿
Dec 23, 18:35 - Higher guidance scale testing
Dec 23, 21:25 - Using more diverse promptsTesting with 5 different tortas prompt
"a photo of lots of tortas in a green field"
"a sketch drawing of a torta floating through space"
"a torta being eaten by a sheep, by Claude Monet"
"a torta watching tv, pixar animation, artstation, realistic, 3d, 8k, unreal engine"
"a happy torta on a postage stamp"Mixed results here
Idea: I wonder if training with a more unique concept name will yield better results, will train with zztortazz as the concept name.
﻿
Run set1
﻿
Dec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good Fine-tuned SD with with "zztortazz" as the concept instead of "torta" 
Looks like its worse than "torta", I guess maybe the model already has a reference for what tortas are
﻿
Run set1
﻿
Dec 31, 11:55 - Re-trained 800 steps - too much torta? 
Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the concept
Dec 31, 12:35 - Trained  Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training tooTrained with a concept called Ae68tVcVmwwQ instead, to see if a more unique concept might help
Completely lost the concept here, I guess the new token needs to be trained into the text encoder
﻿
Run set1
﻿
Jan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvement
Jan 1, 11:50 - Try add "sandwich" class to the training prompt - success!Adding "sandwich" as a class of thing to the training prompt ("a photo of a torta sandwich") and the inference prompt really improved the results!﻿﻿
﻿
Run set1
﻿
Stable Diffusion v1-5
Jan 1, 16:20 - Trained  Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfittedUsed LastBen's fast-stable-diffusion colab﻿
Used colab defaults with captions of "a photo of a Ae68tVcVmwwQ sandwich", i.e. 650 UNet steps and UNet lr of 2e-5 (diffusers colab is 400/2e-6)
Looks to be a little overfitted, it was trained with 650 UNet steps at 2e-5, no text encoder training
Running inference again at guidance == 7 (down from 11) looks a little better, but still very overfitted
Try:
Train again with 400 UNet steps
﻿
Run set2
﻿
Jan 1, 17:05 - Trained  Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfitting
Jan 1, 17:20 - Trained  Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lr
Jan 1, 17:40 - Trained  Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weakUsed LastBen's fast-stable-diffusion colab﻿
Trained with 400 steps and 2e-6, text encoder 150 and 1e-6
Looks like it is still undertrained, will try with more steps for the UNet and Text Encoder
﻿
Run set1
﻿
Jan 1, 18:30 - Trained  Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortas
Jan 1, 18:50 - Trained  Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!Used LastBen's fast-stable-diffusion colab﻿
Trained with UNet 6e-6/400 and TextEncoder 1e-6/150
Lets try keeping the lr at 6e-6 and increase steps
﻿
Run set1
﻿
Jan 1, 19:15 - Trained  Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again - Used LastBen's fast-stable-diffusion colab﻿
Trained with UNet 6e-6/600 and TextEncoder 1e-6/250
Try:
Train for another 200 steps
﻿
Run set1
﻿
﻿
Jan 1, 19:45 - Trained  "torta sandwich" using LastBen's, w/text encoder - Used LastBen's fast-stable-diffusion colab﻿
Trained with UNet 6e-6/600 and TextEncoder 1e-6/250
﻿
﻿
Add a comment