Skip to main content

LoRA fine-tuning of text2image

Created on January 17|Last edited on January 17

Command used for running fine-tuning

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"

accelerate launch --gpu_ids="0," \
./train_text_to_image_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME --caption_column="text" \
--resolution=512 --random_flip \
--train_batch_size=1 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--save_sample_prompt="cute Sundar Pichai creature" --report_to="wandb" \
Details of the train_text_to_image_lora.py script can be found in this PR: https://github.com/huggingface/diffusers/pull/2002. 

Results