Torch cuda erorr

Aylore · April 23, 2022, 11:58pm

hello all
I am training a CNN - LSTM model and every time I try to run the training I always get the following cuda error , I am working on hpc server which have two GPU and using nn.DataParrallel(model) , doesn’t seem to change anything , so I need some help please

PS
image size : 224 ,batch_size= 4
image sequence = 64

thanks

ptrblck · April 24, 2022, 6:29am

You are running out of memory, so you would either need to reduce the memory requirement e.g. by lowering the batch size, or you could trade compute for memory via torch.utils.checkpoint.

Aylore · April 24, 2022, 10:07pm

thanks for your help
but doesn’t the allocated memory and the reserved by pytorch are too large for what they supposed to be ?

ptrblck · April 25, 2022, 5:40am

The memory requirement depend on the used model as well as the input shape. Dont forget that during training the intermediate forward activations, the gradients, and the optimizer’s states (if available) will use additional memory.

Aylore · April 25, 2022, 11:55pm

I am aware of the the gradients , but this error happens even if i try to just pass the model the input data no training or anything

InnovArul · April 26, 2022, 12:04am

If i understand your question correctly, the input size is 4x3x64×224×224?
That’s essentially 256 RGB images of size 224x224. I do not know the type of model you are using. But i would imagine, for a model like ResNet, this input size is already huge for forward pass in a ~11GB GPU. Try reducing the number of images in the sequence or batch size or spatial size of the image.

Aylore · April 26, 2022, 12:07am

it is grey scale not rgb + I changed the spatial size to 64 and the batch to 2 and still doesn’t fit , I use efficient net as an encoder before the LSTM

InnovArul · April 26, 2022, 12:16am

I haven’t used EfficientNet a lot. I’m not sure about it’s memory requirements.

I would try to reduce the number of images in the sequence as far as possible to get a lowest baseline. Then, start from the lowest possible baseline and try to increase the dimensions (batch size, spatial size or number of images in a sequence) one by one to see what’s the maximum limit that can be used in your GPU.

Aylore · April 26, 2022, 12:17am

ok thanks I will try