Skip to main content

Partial Laion-400M on Imagen

Here I detail my experience and thoughts in experimenting with Imagen-PyTorch, an implementation of Imagen (created by Google Research) by Phil Wang aka Lucidrains on GitHub. Being my first W&B Report, I apologize if it's kind of all over the place and has issues, I will essentially be making multiple drafts until my experiment is complete before I finalize it.
Created on June 16|Last edited on August 7

The Imagen Implementation

From everything I've seen, Phil Wang's done very well in replicating SOTA models from paper to code, and this repo is no different, given all of the progress it's gone through, seeing everyone's progress in the LAION discord, and my own experiments. I proudly ended up sponsoring Phil on GitHub for his exemplary work and I hope to help out/contribute more than a bit of money in the future further when my knowledge of ML becomes at least more than somewhere between a novice and intermediate.

My Experiments, Mistakes, and Solutions

Quick little disclaimer: Many of the currently performed experiments are all over the place in terms of parameters, botched and completed runs, etc. as I try to fine-tune things and learn as I go before trying any real project. I enjoy stuff like this as a hobby so please don't expect anything professional! Just enjoy my drafty notes for now. ^~^
I first did some runs, unrecorded in W&B, on my local machine on a system with an NVIDIA RTX 3060 Ti in order to try it out. I couldn't have a very high max batch size due to my 8GB VRAM limit, but the fact that this implementation has a way to accumulate to an arbitrary batch number (given more time), this became a happily a non-issue. Things seemed to run fine albeit not very quick, and even in small batches it took a while. Late images created on the hyperparameters others had tried on the repo and in the LAION discord (i.e. 5 epochs) were sort of minimalistic at best, I'd wondered why the images weren't coming out like others I'd seen.
I was first using the first 9,999 image-text pairs in LAION-400M to do very first tests, it didn't seem to go anywhere so I thought I'd move to Google Colab. It was then where I was reading more on both the discord and the repo issues that I was missing more changes to hyperparameters, like batch sizes needing to be at least 32 to be anywhere close to something. It was here I decided I should record everything to W&B to store used hyperparams per run and see the loss differences, as well as keep record of at least GPU memory allocated based on changes. For all current runs as of 6-16-2022, the memory graph below was on a Tesla V100 on Google Colab. If I happen to get a higher memory card, this may skew percentage usage if staying in the same max batch size, I will attempt to scale accordingly to prevent this.
As of 6-20-2022, I started getting it to work a few test runs on my A100 workstation, and now am starting to use it for increasingly large runs.

Graphs

As below, the first graph shows the loss of each run. Except for the first run, all runs should have relevant stored hyperparameter in its respective info section. More may appear as I learn more to change in different runs.
GPU Memory Allocation was monitored so I can generally see what hyperparameters use what amount of memory.

Summarized Extra Findings

  • Batch sizes must be higher 32+ in order for training to really go anywhere.
  • 5 epochs seems to be standard.
  • cond. scale should be around 3 or so.
  • 32-dimensional unets, as I'd first seen as default, perform way worse than something like a 128+ dimensional unet, as learned from a warning recently implemented (as of 6-16-2022) into Imagen-PyTorch, thanks Phil!
  • I was not providing the correct image sizes based on the input, this problem is now fixed. This seems incorrect.
  • Upon getting an A100 in Colab I learned 128+ base dim unets have huge memory usage scaling on upscaling/secondary unets; so for A100s, 16 max_batch and 64 base dims seem to use about 98% memory and work great.
  • DP doesn't seem to work when trying it on a vast.ai 6x 3090 instance.
  • For a 128-dim u-net, similar (except not 32 dims) to that of the two created in the notebook shared in issue 24 on the imagen-pytorch github repo, it seems an A100-80GB works to train them if max_batch is around 14 (16 or higher results in an OOM, dataset size of course may matter, but this was tried on around the first ~19,999 images on the LAION-400M dataset). I will likely post my notebook in the near future either here as links or somewhere on my GitHub repo.
  • A 192-dim set of unets is possible on an A100-80GB if grad accum is 24 and both unets are set to memory_efficient. Also turning off scale_resnet_skip_connection on the unets for 192 dims improves loss early, not sure why this is, it's usually fine on 128 and below.
  • I started to try and use more images from part 1 of LAION-400M as well as T5-XL instead of T5-base as the text embeddings may generally be better. (Thanks to @m9 and @Dan from LAION for these insights).

W&B Changes

  • Starting from run sparkling-fire-5, unet dimensions were recorded.
  • Subsequent runs after above mentioned run now also record average loss and image sizes.
  • Starting around skilled-thunder 9, GPU used is now recorded after being able to use an A100 thanks to chance on Google Colab!
  • I only recently (after different-pine-17 ) realized that I was not recording gradient accumulation values. This has been added).
  • Since proud-cloud-32, I've now started monitoring a few more parameters, such as memory_efficient flag and scale_resnet_skip_connection.
  • Runs after major-wind-38 will be testing new Elucidated Imagen implementation and more on the 1.5.9 repo. Code and config changes were made to try and use the repo's implementations of things like text embeddings more, configs shared in the imagen-pytorch section of the LAION discord have inspired change as well.

Ghost Runs

  • lunar-durian-6 failed to run due to CUDA OOM. Changed input image sizes to 256, max_batch was at 11.
  • rich-salad-7 failed to run due to CUDA OOM. max_batch was 8.
  • decent-sunset-8 OOM'd when loading unet 2. Given the run graphs for memory allocated, I now know what allocation size to look out for.
  • lyric-meadow-15 and the run after OOM'd when loading unet 2, tried on A100-80GB. 128-dim unets on the super-res scale really seem to use a lot of memory. some hyperparameters are being tuned.
  • pretty-sponge-41 was run too far ahead of the last run's saved checkpoint, a new one will restart to resume that run's epoch at the correct step.
  • 44 - deleted due to a problem starting the training run (epoch 2 of last few runs).
  • 47-49 - issues running during 1.5.12 conversion.

Run set
4