Training Journal: DreamBooth Torta Fine-tuning
Created on December 23|Last edited on January 5
Comment
SummaryTasksCreate data of images of tortasCreate a dataset of tortasFine-tune Stable DiffusionW&B Tables logging codeStable Diffusion v1-4 Dec 23, 17:50 - 1 row loggedDec 23, 18:08 - samples logged, too much "green field" generatedDec 23, 18:20 - Modified prompt, better!Dec 23, 18:35 - Higher guidance scale testingDec 23, 21:25 - Using more diverse promptsDec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good Dec 31, 11:55 - Re-trained 800 steps - too much torta? Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the conceptDec 31, 12:35 - Trained Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training tooJan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvementJan 1, 11:50 - Try add "sandwich" class to the training prompt - success!Stable Diffusion v1-5Jan 1, 16:20 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfittedJan 1, 17:05 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfittingJan 1, 17:20 - Trained Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lrJan 1, 17:40 - Trained Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weakJan 1, 18:30 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortasJan 1, 18:50 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!Jan 1, 19:15 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again - Jan 1, 19:45 - Trained "torta sandwich" using LastBen's, w/text encoder -
My training journal for the Hugging Face DreamBooth fine-tuning hackathon. Here I'll be fine-tuning on images of tortas, a delicious variety of sandwich from Mexico!
Summary
- Adding a "class of thing" to the training and inference prompts unlocked the best results with SD v1-4, e.g. training on "torta sandwich" instead of just "torta"
- Training at higher learning rates, like in LastBens colab, works for unique concept names like "Ae68tVcVmwwQ", but is also easier to overfit
- Tried:
- more steps, didn't work, overfitted
- more unique prompts such as "zztortazz" and "Ae68tVcVmwwQ" didn't work so well, possibly the text encoder needed to be trained too
- the diffusers library train script, train_dreambooth.py, with similar setting, no big difference
Tasks
- Fine-tune Stable Diffusion 1.4 on images of tortas
- Explore guidance_scale settings on 1 prompt
- Explore guidance_scale settings on 5 more diverse prompts
- Fine-tune Stable Diffusion 2.1 on images of tortas
Create data of images of tortas
Create a dataset of tortas
Upload a folder of screenshots of tortas from google review images from tortas stands in Mexico City.
Tasty Tortas
Upload to HF Hub for use with the datasets library
from datasets import load_datasetdataset = load_dataset("imagefolder", data_dir="data/tortas")dataset.push_to_hub("tortas")
Fine-tune Stable Diffusion
Fine-tune using the excellent DreamBooth hackathon colab provided:
Colab here
W&B Tables logging code
# start a wandb runimport wandbimport torchfrom diffusers import StableDiffusionPipelinepipe = StableDiffusionPipeline.from_pretrained("morgan/torta",torch_dtype=torch.float16,).to("cuda")num_cols = 6name_of_your_concept = "torta"prompt = f"a photo of a green field full of {name_of_your_concept}s"img_ls = []for i in range(num_cols):img_ls.append(f"im_{i}")cols = ['concept', 'prompt', 'guidance_scale']cols.extend(img_ls)# start a wandb run and create a Tablewandb.init(entity='morgan', project='hf-dreambooth', config=vars(args))tbl = wandb.Table(columns=cols)for guidance_scale in range(8, 12):all_images = []for _ in range(num_cols):images = pipe(prompt, guidance_scale=guidance_scale).imagesall_images.extend(images)# add a row of data to the wandb tabletbl.add_data(name_of_your_concept, prompt, guidance_scale, wandb.Image(all_images[0]), wandb.Image(all_images[1]),wandb.Image(all_images[2]), wandb.Image(all_images[3]), wandb.Image(all_images[4]), wandb.Image(all_images[5]))# log the wandb datawandb.log({"tortas-table":tbl})wandb.finish()
Stable Diffusion v1-4
Dec 23, 17:50 - 1 row logged
Dec 23, 18:08 - samples logged, too much "green field" generated
Dec 23, 18:20 - Modified prompt, better!
New prompt: "lots of tortas in a green field" - much better as there are more samples with tortas.
However from guidance_scale from 7 - 12 there are still sometimes some samples without any tortas.
Also, some generated tortas seem to have more green fillings like maybe lettuce/spinanch/cucumber/avocade than tortas typically have - I count 3 out of 37 training images with bits of green. The model might be drawing too many parallers between tortas and sandwiches and/or hamburgers
Ideas:
- Try more sampling with higher guidance_scale
- Train for longer
Dec 23, 18:35 - Higher guidance scale testing
Dec 23, 21:25 - Using more diverse prompts
Testing with 5 different tortas prompt
"a photo of lots of tortas in a green field" "a sketch drawing of a torta floating through space" "a torta being eaten by a sheep, by Claude Monet" "a torta watching tv, pixar animation, artstation, realistic, 3d, 8k, unreal engine" "a happy torta on a postage stamp"
Mixed results here
Idea: I wonder if training with a more unique concept name will yield better results, will train with zztortazz as the concept name.
Dec 23, 22:10 - Re-trained with more unique concept, zztortazz - no good
Fine-tuned SD with with "zztortazz" as the concept instead of "torta"
Looks like its worse than "torta", I guess maybe the model already has a reference for what tortas are
Dec 31, 11:55 - Re-trained 800 steps - too much torta?
Dec 31, 12:20 - Trained zztortazz for 800 steps - not great, doesn't get the concept
Dec 31, 12:35 - Trained Ae68tVcVmwwQ for 1400 steps - no Tortas, prob needs text encoder training too
Trained with a concept called Ae68tVcVmwwQ instead, to see if a more unique concept might help
- Completely lost the concept here, I guess the new token needs to be trained into the text encoder
Jan 1, 11:25 - Used diffusers' original train_dreambooth.py' script - No visible improvement
Jan 1, 11:50 - Try add "sandwich" class to the training prompt - success!
- Adding "sandwich" as a class of thing to the training prompt ("a photo of a torta sandwich") and the inference prompt really improved the results!
Stable Diffusion v1-5
Jan 1, 16:20 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion colab - overfitted
- Used colab defaults with captions of "a photo of a Ae68tVcVmwwQ sandwich", i.e. 650 UNet steps and UNet lr of 2e-5 (diffusers colab is 400/2e-6)
- Looks to be a little overfitted, it was trained with 650 UNet steps at 2e-5, no text encoder training
- Running inference again at guidance == 7 (down from 11) looks a little better, but still very overfitted
- Try:
- Train again with 400 UNet steps
Jan 1, 17:05 - Trained Ae68tVcVmwwQ using LastBen's fast-stable-diffusion, 400 steps - still overfitting
Jan 1, 17:20 - Trained Ae68tVcVmwwQ using LastBen's 400 steps and 2e-6 - maybe too weak a lr
Jan 1, 17:40 - Trained Ae68tVcVmwwQ using LastBen's 400 steps, 2e-6, with text encoder train - better, but still weak
- Trained with 400 steps and 2e-6, text encoder 150 and 1e-6
- Looks like it is still undertrained, will try with more steps for the UNet and Text Encoder
Jan 1, 18:30 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, more training - still kinda weak, not enough tortas
Jan 1, 18:50 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, lr tweaking - better!
- Trained with UNet 6e-6/400 and TextEncoder 1e-6/150
- Lets try keeping the lr at 6e-6 and increase steps
Jan 1, 19:15 - Trained Ae68tVcVmwwQ using LastBen's, w/text encoder, inc steps again -
- Trained with UNet 6e-6/600 and TextEncoder 1e-6/250
- Try:
- Train for another 200 steps
Jan 1, 19:45 - Trained "torta sandwich" using LastBen's, w/text encoder -
- Trained with UNet 6e-6/600 and TextEncoder 1e-6/250
-
Add a comment