Dreambooth fgreeneruins-ruins

Quick Training Report for the Hugging Face DreamBooth fine-tuning hackathon. Concept : fgreeneruins : Forest ruins, greenery Theme : Landscape
Mathieu Jouffroy
Created on January 23|Last edited on January 26
Comment
Dreambooth is a technique to teach new concepts to Stable Diffusion using a specialized form of fine-tuning -- teaching the model a new word with the corresponding images. To help the model understand this new 'concept' (here "fgreeneruins") you also give the class of this concept (here "ruins") to the model.
Intro (from )ExperimentsSettingsLearning Rate 2e-6Learning Rate 1e-6Comparison of best resultsBest Results Summary
﻿
Intro (from HF dreambooth report)With Dreambooth, StableDiffusion overfits quickly. It's important to find the right learning rate (LR) and training steps for your dataset. Training a higher learning rate for less steps and training a lower learning rate for more steps gives very similar results. We have to find the 'sweet spot' training steps for a given learning rate to get reasonable images.
If you see that the generated images are noisy, or the quality is degraded, it likely means overfitting. First, try the steps above to avoid it. If the generated images are still noisy, use the DDIM scheduler or run more inference steps (~100 worked well in our experiments).
HF dreambooth experiments -> Doing EMA doesn't seem to make a difference.
Getting good results when training Dreambooth needs lot of tweaking
﻿
Experiments
SettingsDataset : CCMat/db-forest-ruins
len : 17
Selected pretrained models:
prompthero/openjourney
nitrosocke/elden-ring-diffusion
AdamW optimizer
No prior preservation is used.
No fine-tuning the text encoder (for computational constraints)
kept all hyperparameters equal across runs, except LR, training steps and gradient accumulation steps
fixed hyperparameters include: 
lr_scheduler : constant
resolution : 512
train_batch_size : 1
using 8bit optimizer from bitsandbytes
Learning rated tested : 2e-6, 1e-6
﻿
Learning Rate 2e-6Images start to get noisy/degraded around step 500 -> model is overfitting
The images don't seem to really assimilate the concept before step 300
Decided to log samples every 17 steps (one pass of our training set) between step 300 and 442
﻿
Run set4
﻿
tried different prompts : 
"a photo of fforuins ruins"
"a photo of fgreeneruins ruins" -> seems to create "greener" images -> better
=> Most promising steps : 
340 - 357 - 374  for both pretrained models
﻿
Learning Rate 1e-6Images start to get noisy/degraded after step 800 -> model is overfitting
The images don't seem to really assimilate the concept before step 400
﻿
Run set3
﻿
tried different prompts : 
"a photo of fforuins ruins"
"a photo of fgreeneruins ruins" -> seems to create "greener" images -> better
Increasing gradient accumulation steps to 2 seems to give less clear images and overfits more quickly
seems harder to find the correct hyperparameters settings
=> Most promising steps :  between 700 - 800
After a doing a comparison of the outputs of the models at these steps :
prompthero/openjourney : step 782
﻿
Comparison of best results﻿
﻿
﻿
﻿
﻿
Best Results 2e-6 : 
openjourney - step 357
elden-ring - step 340
1e-6 : openjourney - step 782
﻿
﻿
SummaryTo get good images that incorporate well the concept without degrading other objects, it's important to:
Tune the learning rate and training steps for your dataset. 
High learning rates and too many training steps will lead to overfitting (in other words, the model can only generate images from your training data, no matter the prompt).
Low learning rates and too few steps will lead to underfitting, this is when the model can not generate the trained concept.
The image quality degrades quite a lot if the model overfits and this happens if:
 The learning rate is too high
We run too many training steps
Adding gradient accumulation steps doesn't seem to improve the quality of the images
﻿
Add a comment