Dreambooth fgreeneruins-ruins
Quick Training Report for the Hugging Face DreamBooth fine-tuning hackathon.
Concept : fgreeneruins : Forest ruins, greenery
Theme : Landscape
Created on January 23|Last edited on January 26
Comment
Dreambooth is a technique to teach new concepts to Stable Diffusion using a specialized form of fine-tuning -- teaching the model a new word with the corresponding images. To help the model understand this new 'concept' (here "fgreeneruins") you also give the class of this concept (here "ruins") to the model.
Intro (from )ExperimentsSettingsLearning Rate 2e-6Learning Rate 1e-6Comparison of best resultsBest Results Summary
Intro (from HF dreambooth report)
- With Dreambooth, StableDiffusion overfits quickly. It's important to find the right learning rate (LR) and training steps for your dataset. Training a higher learning rate for less steps and training a lower learning rate for more steps gives very similar results. We have to find the 'sweet spot' training steps for a given learning rate to get reasonable images.
- If you see that the generated images are noisy, or the quality is degraded, it likely means overfitting. First, try the steps above to avoid it. If the generated images are still noisy, use the DDIM scheduler or run more inference steps (~100 worked well in our experiments).
- HF dreambooth experiments -> Doing EMA doesn't seem to make a difference.
- Getting good results when training Dreambooth needs lot of tweaking
Experiments
Settings
- Dataset : CCMat/db-forest-ruins
- len : 17
- Selected pretrained models:
- prompthero/openjourney
- nitrosocke/elden-ring-diffusion
- AdamW optimizer
- No prior preservation is used.
- No fine-tuning the text encoder (for computational constraints)
- kept all hyperparameters equal across runs, except LR, training steps and gradient accumulation steps
- fixed hyperparameters include:
- lr_scheduler : constant
- resolution : 512
- train_batch_size : 1
- using 8bit optimizer from bitsandbytes
- Learning rated tested : 2e-6, 1e-6
Learning Rate 2e-6
- Images start to get noisy/degraded around step 500 -> model is overfitting
- The images don't seem to really assimilate the concept before step 300
- Decided to log samples every 17 steps (one pass of our training set) between step 300 and 442
Run set
4
- tried different prompts :
- "a photo of fforuins ruins"
- "a photo of fgreeneruins ruins" -> seems to create "greener" images -> better
=> Most promising steps :
- 340 - 357 - 374 for both pretrained models
Learning Rate 1e-6
- Images start to get noisy/degraded after step 800 -> model is overfitting
- The images don't seem to really assimilate the concept before step 400
Run set
3
- tried different prompts :
- "a photo of fforuins ruins"
- "a photo of fgreeneruins ruins" -> seems to create "greener" images -> better
- Increasing gradient accumulation steps to 2 seems to give less clear images and overfits more quickly
- seems harder to find the correct hyperparameters settings
=> Most promising steps : between 700 - 800
After a doing a comparison of the outputs of the models at these steps :
- prompthero/openjourney : step 782
Comparison of best results
Best Results
- 2e-6 :
- openjourney - step 357
- elden-ring - step 340
- 1e-6 : openjourney - step 782
Summary
To get good images that incorporate well the concept without degrading other objects, it's important to:
- Tune the learning rate and training steps for your dataset.
- High learning rates and too many training steps will lead to overfitting (in other words, the model can only generate images from your training data, no matter the prompt).
- Low learning rates and too few steps will lead to underfitting, this is when the model can not generate the trained concept.
- The image quality degrades quite a lot if the model overfits and this happens if:
- The learning rate is too high
- We run too many training steps
- Adding gradient accumulation steps doesn't seem to improve the quality of the images
Add a comment