Running into out of memory issues

KVS_Moudgalya · April 5, 2023, 6:57am

Hi,

I’m currently trying to train diffusion model for 2D image generation task with images as input.
Training on AWS G5 instances i.e., A10G GPU’s with 24GB GPU memory.
I’m running into out of memory issues when I go beyond image size of 256x256 and batch size of 8.
Results with Image size = 256 and batch size = 8 is unacceptable
I did use gradient accumulation and mixed precision training.
Using 1 attention block only.

Trying to understand is it a genuine memory issue or can it be solved by some other approaches?
Is diffusion models so heavy that even a 24GB memory GPU is insufficient?
What’s the typical memory requirement for running image to image diffusion models, for generating images resolution higher than 512x512

Thanks and regards
KVS Moudgalya

J_Johnson · April 5, 2023, 8:00am

Welcome to the Pytorch Forums!

Couple of questions/comments:

Can you provide a model summary?
Are you using self attention and what type?
Typically, UNet diffusion models are trained on 512x512 images but can be used to extend to larger images, given it’s entire structure involves convolutions.
What float are you using? You may find mixed precision or bfloat16 to be sufficient, but that will typically require ~half the memory as float32.
What optimizer are you using? Different optimizers require more or less elements in the graph per parameter.

KVS_Moudgalya · April 10, 2023, 1:10pm

Hey, thanks Johnson

Employing DDPM architecture with UNet model.
I’m using one Q,K,V attention block.
Optimizer - torch adamw
Yes, I have tried gradient accumulation and mixed precision training. It did not improve.

So, I am trying to understand
Is diffusion models so heavy that even a 24GB memory GPU is insufficient?
What’s the typical GPU memory requirement for running image to image diffusion models, for generating images of resolution higher than 512x512

J_Johnson · April 10, 2023, 2:30pm

There are some additional memory-efficient methods you can make within the UNet model architecture as found in this code here:

github.com

lucidrains/imagen-pytorch/blob/main/imagen_pytorch/imagen_pytorch.py

import math
import copy
from random import random
from beartype.typing import List, Union
from beartype import beartype
from tqdm.auto import tqdm
from functools import partial, wraps
from contextlib import contextmanager, nullcontext
from collections import namedtuple
from pathlib import Path

import torch
import torch.nn.functional as F
from torch.nn.parallel import DistributedDataParallel
from torch import nn, einsum
from torch.cuda.amp import autocast
from torch.special import expm1
import torchvision.transforms as T

import kornia.augmentation as K

This file has been truncated. show original

And additionally, a more efficient self-attention architecture, located here:

Just note that vanilla self-attention can be a bit resource hungry.

Lastly, UNets are quite slow when it comes to training, even with the most recent GPUs. Especially when you get into training them on larger sizes, like 512x512. Which is why they are usually trained on multiple TPUs. However, GigaGAN looks promising, and I see Phil Wang is almost done with his Pytorch version of it, here:

KVS_Moudgalya · April 10, 2023, 4:03pm

Thanks @J_Johnson Will try this out and revert back to you the outcomes.