Running into out of memory issues

Hi,

  • I’m currently trying to train diffusion model for 2D image generation task with images as input.
  • Training on AWS G5 instances i.e., A10G GPU’s with 24GB GPU memory.
  • I’m running into out of memory issues when I go beyond image size of 256x256 and batch size of 8.
  • Results with Image size = 256 and batch size = 8 is unacceptable
  • I did use gradient accumulation and mixed precision training.
  • Using 1 attention block only.

Trying to understand is it a genuine memory issue or can it be solved by some other approaches?
Is diffusion models so heavy that even a 24GB memory GPU is insufficient?
What’s the typical memory requirement for running image to image diffusion models, for generating images resolution higher than 512x512

Thanks and regards
KVS Moudgalya

Welcome to the Pytorch Forums!

Couple of questions/comments:

  1. Can you provide a model summary?
  2. Are you using self attention and what type?
  3. Typically, UNet diffusion models are trained on 512x512 images but can be used to extend to larger images, given it’s entire structure involves convolutions.
  4. What float are you using? You may find mixed precision or bfloat16 to be sufficient, but that will typically require ~half the memory as float32.
  5. What optimizer are you using? Different optimizers require more or less elements in the graph per parameter.

Hey, thanks Johnson

  1. Employing DDPM architecture with UNet model.
  2. I’m using one Q,K,V attention block.
    Optimizer - torch adamw
    Yes, I have tried gradient accumulation and mixed precision training. It did not improve.

So, I am trying to understand
Is diffusion models so heavy that even a 24GB memory GPU is insufficient?
What’s the typical GPU memory requirement for running image to image diffusion models, for generating images of resolution higher than 512x512

There are some additional memory-efficient methods you can make within the UNet model architecture as found in this code here:

And additionally, a more efficient self-attention architecture, located here:

Just note that vanilla self-attention can be a bit resource hungry.

Lastly, UNets are quite slow when it comes to training, even with the most recent GPUs. Especially when you get into training them on larger sizes, like 512x512. Which is why they are usually trained on multiple TPUs. However, GigaGAN looks promising, and I see Phil Wang is almost done with his Pytorch version of it, here:

1 Like

Thanks @J_Johnson Will try this out and revert back to you the outcomes.

1 Like