Training Journal: DreamBooth Torta Fine-tuning
Created on December 23|Last edited on January 5
Comment
SummaryTasksCreate data of images of tortasCreate a dataset of tortasFine-tune Stable DiffusionW&B Tables logging codeStable Diffusion v1-4 Dec 23, 17:50 - 1 row loggedDec 23, 18:08 - samples logged, too much "green field" generatedDec 23, 18:20 - Modified prompt, better!Dec 23, 18:35 - Higher guidance scale testingDec 23, 21:25 - Using more diverse promptsDec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good Dec 31, 11:55 - Re-trained 800 steps - too much torta? Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the conceptDec 31, 12:35 - Trained Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training tooJan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvementJan 1, 11:50 - Try add "sandwich" class to the training prompt - success!Stable Diffusion v1-5Jan 1, 16:20 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfittedJan 1, 17:05 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfittingJan 1, 17:20 - Trained Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lrJan 1, 17:40 - Trained Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weakJan 1, 18:30 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortasJan 1, 18:50 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!Jan 1, 19:15 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again - Jan 1, 19:45 - Trained "torta sandwich" using LastBen's, w/text encoder -
My training journal for the Hugging Face DreamBooth fine-tuning hackathon. Here I'll be fine-tuning on images of tortas, a delicious variety of sandwich from Mexico!
Summary
- Adding a "class of thing" to the training and inference prompts unlocked the best results with SD v1-4, e.g. training on "torta sandwich" instead of just "torta"
- Training at higher learning rates, like in LastBens colab, works for unique concept names like "Ae68tVcVmwwQ", but is also easier to overfit
- Tried:
- more steps, didn't work, overfitted
- more unique prompts such as "zztortazz" and "Ae68tVcVmwwQ" didn't work so well, possibly the text encoder needed to be trained too
- the diffusers library train script, train_dreambooth.py, with similar setting, no big difference
Tasks
- Fine-tune Stable Diffusion 1.4 on images of tortas
- Explore guidance_scale settings on 1 prompt
- Explore guidance_scale settings on 5 more diverse prompts
- Fine-tune Stable Diffusion 2.1 on images of tortas
Create data of images of tortas
Create a dataset of tortas
Upload a folder of screenshots of tortas from google review images from tortas stands in Mexico City.
Tasty Tortas
Run set
1
Upload to HF Hub for use with the datasets library
from datasets import load_datasetdataset = load_dataset("imagefolder", data_dir="data/tortas")dataset.push_to_hub("tortas")
Fine-tune Stable Diffusion
Fine-tune using the excellent DreamBooth hackathon colab provided:
Colab here
W&B Tables logging code
# start a wandb runimport wandbimport torchfrom diffusers import StableDiffusionPipelinepipe = StableDiffusionPipeline.from_pretrained("morgan/torta",torch_dtype=torch.float16,).to("cuda")num_cols = 6name_of_your_concept = "torta"prompt = f"a photo of a green field full of {name_of_your_concept}s"img_ls = []for i in range(num_cols):img_ls.append(f"im_{i}")cols = ['concept', 'prompt', 'guidance_scale']cols.extend(img_ls)# start a wandb run and create a Tablewandb.init(entity='morgan', project='hf-dreambooth', config=vars(args))tbl = wandb.Table(columns=cols)for guidance_scale in range(8, 12):all_images = []for _ in range(num_cols):images = pipe(prompt, guidance_scale=guidance_scale).imagesall_images.extend(images)# add a row of data to the wandb tabletbl.add_data(name_of_your_concept, prompt, guidance_scale, wandb.Image(all_images[0]), wandb.Image(all_images[1]),wandb.Image(all_images[2]), wandb.Image(all_images[3]), wandb.Image(all_images[4]), wandb.Image(all_images[5]))# log the wandb datawandb.log({"tortas-table":tbl})wandb.finish()
Stable Diffusion v1-4
Dec 23, 17:50 - 1 row logged
Dec 23, 18:08 - samples logged, too much "green field" generated
Dec 23, 18:20 - Modified prompt, better!
New prompt: "lots of tortas in a green field" - much better as there are more samples with tortas.
However from guidance_scale from 7 - 12 there are still sometimes some samples without any tortas.
Also, some generated tortas seem to have more green fillings like maybe lettuce/spinanch/cucumber/avocade than tortas typically have - I count 3 out of 37 training images with bits of green. The model might be drawing too many parallers between tortas and sandwiches and/or hamburgers
Ideas:
- Try more sampling with higher guidance_scale
- Train for longer
Run set
1
Dec 23, 18:35 - Higher guidance scale testing
Dec 23, 21:25 - Using more diverse prompts
Testing with 5 different tortas prompt
"a photo of lots of tortas in a green field" "a sketch drawing of a torta floating through space" "a torta being eaten by a sheep, by Claude Monet" "a torta watching tv, pixar animation, artstation, realistic, 3d, 8k, unreal engine" "a happy torta on a postage stamp"
Mixed results here
Idea: I wonder if training with a more unique concept name will yield better results, will train with zztortazz as the concept name.
Run set
1
Dec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good
Fine-tuned SD with with "zztortazz" as the concept instead of "torta"
Looks like its worse than "torta", I guess maybe the model already has a reference for what tortas are
Run set
1
Dec 31, 11:55 - Re-trained 800 steps - too much torta?
Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the concept
Dec 31, 12:35 - Trained Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training too
Trained with a concept called Ae68tVcVmwwQ instead, to see if a more unique concept might help
- Completely lost the concept here, I guess the new token needs to be trained into the text encoder
Run set
1
Jan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvement
Jan 1, 11:50 - Try add "sandwich" class to the training prompt - success!
- Adding "sandwich" as a class of thing to the training prompt ("a photo of a torta sandwich") and the inference prompt really improved the results!
Run set
1
Stable Diffusion v1-5
Jan 1, 16:20 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfitted
- Used colab defaults with captions of "a photo of a Ae68tVcVmwwQ sandwich", i.e. 650 UNet steps and UNet lr of 2e-5 (diffusers colab is 400/2e-6)
- Looks to be a little overfitted, it was trained with 650 UNet steps at 2e-5, no text encoder training
- Running inference again at guidance == 7 (down from 11) looks a little better, but still very overfitted
- Try:
- Train again with 400 UNet steps
Run set
2
Jan 1, 17:05 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfitting
Jan 1, 17:20 - Trained Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lr
Jan 1, 17:40 - Trained Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weak
- Trained with 400 steps and 2e-6, text encoder 150 and 1e-6
- Looks like it is still undertrained, will try with more steps for the UNet and Text Encoder
Run set
1
Jan 1, 18:30 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortas
Jan 1, 18:50 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!
- Trained with UNet 6e-6/400 and TextEncoder 1e-6/150
- Lets try keeping the lr at 6e-6 and increase steps
Run set
1
Jan 1, 19:15 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again -
- Trained with UNet 6e-6/600 and TextEncoder 1e-6/250
- Try:
- Train for another 200 steps
Run set
1
Jan 1, 19:45 - Trained "torta sandwich" using LastBen's, w/text encoder -
- Trained with UNet 6e-6/600 and TextEncoder 1e-6/250
-
Add a comment