Is this memory use normal (ResUnet) : 7GO for one image

Lelouch1 · August 23, 2022, 1:59pm

I am currently working with a R2U-Net from this paper [1802.06955] Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation

The model contains ~40 millions parameters with a 1->64->128->256->512->1024->512->256->128->64->1 architecture, which is similar to a ResNet 101 in terms of number of paramater.

However I find that when I am using this model for training with 448x448 images, it takes almost 7GO of memory on my GPU which seems pretty big. I can only use a batch size of 1.

Do you this this is normal or would it rather comes from an unoptimized code ? I lack knowledge about plausible memory use for such architectures.

ptrblck · August 23, 2022, 9:31pm

It might be expected and you could check the number of trainable parameters, buffers, and the shape of all intermediate activations to verify it. This post gives you an example how to do so.
Alternatively, you could also add debug print statements into the forward method and check how much memory is used where via print(torch.cuda.memory_summary()).

Lelouch1 · August 24, 2022, 12:29pm

I used your code and it tells me that it should take 2.7GB which is far from 6-7GB. However I have reccurent layers that will loop over the same conv module several times, so I suppose I have to change what you did to have a proper estimation because it’s underestimating the number of activations.

I also used print(torch.cuda.memory_summary()) at the end of a forward pass and it tells my that the current usage is 4.1GB and the peak usage is 8.4GB. The peaks seems to be a bit above what I observe but I suppose this is because I used a DataParallel and therefore the first gpu used a bit more memory.

ptrblck · August 24, 2022, 7:25pm

Did you only calculate the parameters/buffers or also the intermediate activations?

Yes, this sounds reasonable and my simple code snippet might not be able to capture the recurrent usage of modules. In any case, it is also not a perfect calculation, but can be used to estimate which part of the model uses the majority of memory.