Dreambooth fluffalpaca-llama

Quick Training Report for the Hugging Face DreamBooth fine-tuning hackathon. Concept : fluffalpaca : Fluffy alpaca Theme : Animal
Mathieu Jouffroy
Created on January 23|Last edited on January 27
Comment
Dreambooth is a technique to teach new concepts to Stable Diffusion using a specialized form of fine-tuning -- teaching the model a new word with the corresponding images. To help the model understand this new 'concept' (here "fluffalpaca") you also give the class of this concept (here "llama") to the model.
﻿
Intro (heavily inspired from )ExperimentsSettingsWithout Prior PreservationMid Learning Rate 2e-6Low Learning Rates 1e-6 & 9e-7FindingsWith Prior Preservation (better results)Mid Learning Rate 2e-6Low Learning Rates 1e-6FindingsComparison of best resultsSummary (heavily inspired from )
﻿
﻿
Intro (heavily inspired from HF dreambooth report)
Experiments
SettingsSelected pretrained models:
prompthero/openjourney
runwayml/stable-diffusion-v1-5
stabilityai/stable-diffusion-2 (best results)
Dataset : 
the first dataset used was a dataset of 13 images of alpacas 
it was not diversified enough to produce good results
CCMat/db-aplaca (final)
len = 22
AdamW optimizer
No fine-tuning the text encoder (for computational constraints)
kept all hyperparameters equal across runs, except LR, training steps and gradient accumulation steps
fixed hyperparameters include: 
lr_scheduler : constant
resolution : 512 / 768 (for stable diffusion 2)
train_batch_size : 1
using 8bit optimizer from bitsandbytes
Learning rated tested : 2e-6, 1e-6, 9e-7
Class for prior preservation : llama
Class dataset for prior preservation : CCMat/llama
len = 52
I created my own class dataset of llama as the pretrained models were generating bad images of llama when doing prior preservation
﻿
Without Prior Preservationprompt : "A photo of fluffllama llama"
first dataset of 13 images of alpacas (It wasn't varied enough)
Mid Learning Rate 2e-6﻿
Run set2
﻿
the models have difficulties generating the faces of the alpacas
the bodies are a bit distorted / body proportions are not good
﻿
Low Learning Rates 1e-6 & 9e-7﻿
Run set3
﻿
the models have difficulties generating the faces of the alpacas
decreasing the learning rate to 9e-7 doesn't produce better results
﻿
Findingswithout prior preservation it is hard to generate the faces of the alpacas correctly 
Increasing gradient accumulation steps to 2 doesn't seem to improve the quality of the images
Decreasing the learning rate to 1e-6 seems to give the best results (at least it seems easier to find the 'sweet-spot', especially for the faces).
﻿
With Prior Preservation (better results)better results especially for the faces of the alpacas
prompt : "A photo of fluffalpaca llama"
dataset : CCMat/db-aplaca
class_dataset : CCMat/llama 
keeping gradient accumulation steps to 1
the pretrained models don't generate good images of alpaca or llamas, therefore I created my own class dataset of llamas for prior preservertation 
Mid Learning Rate 2e-6﻿
Run set3
﻿
﻿
﻿
Run set1
﻿
Difficult to generate good images of alpacas without overfitting
﻿
Low Learning Rates 1e-6﻿
Run set1
﻿
(don't mind the step number on image table above)
﻿
﻿
Run set1
﻿
﻿
﻿
Run set1
﻿
prior loss of 0.6 seems to give the best results
the models seem to overfit after step 1300
stabilityai/stable-diffusion-2 produces better images of 'alpacas' compared to runwayml/stable-diffusion-v1-5
Best model :
	- stabilityai/stable-diffusion-2 : step 1034 - 1056 - 1078 - 1100
﻿
FindingsThe pretrained stabilityai/stable-diffusion-2 generates the best images for my concept
A low learning rate of 1e-6 gives better result than a learning rate of 2e-6
With prior preservation, our models generate better faces for the subject
For the pretrained stabilityai/stable-diffusion-2 the best settings are the following :
learning rate : 1e-6
step : 1034 - 1056 - 1078 - 1100
prior loss : 0.6
class : llama
class_dataset : CCMat/llama
len : 52 
﻿
Comparison of best resultsNote : While training on this last run I added 8 images  generated by Stable diffusion to the class_dataset (I added 16 but handpicked 8) for prior preservation
﻿
Run set1
﻿
﻿
﻿
﻿
At step 1012 the model seems promising although the faces of the subject are still a bit distorted 
=> best models : 
	-  stabilityai/stable-diffusion-2 : step 1100
	-  stabilityai/stable-diffusion-2 : step 1078
﻿
﻿
Summary (heavily inspired from HF dreambooth report)To get good images that incorporate well the concept without degrading other objects, it's important to:
Tune the learning rate and training steps for your dataset. 
High learning rates and too many training steps will lead to overfitting (in other words, the model can only generate images from your training data, no matter the prompt).
Low learning rates and too few steps will lead to underfitting, this is when the model can not generate the trained concept.
 1e-6 with ~1100 steps seems to work well for the faces of our subject 
If our model has difficulty generating faces without overfitting => use prior preservation
The image quality degrades quite a lot if the model overfits and this happens if:
 The learning rate is too high
We run too many training steps
In the case of faces, if no prior preservation is used
If the image quality is still degraded even after these changes:
Try different schedulers
Use more inference steps
A diversified dataset is important for fine-tuning Stable Diffusion with dreambooth (especially to generate a concept that belongs to a class Stable Diffusion has difficulty to generate).
﻿
Add a comment