Small model size but high GPU memory usage


I am doing some GAN research and I am running into a problem with memory efficiency.
My both networks together have a size of around 50 MB. However, training with batches of size 3 already uses all of my GPU memory, i.e. 12 GB. The input consists of 512 x 512 images concatenated with some binary masks. I am aware that autograd needs to keep track of additional things for the backward pass, but it just seems a little bit too much for me.

I couldn’t find any useful memory efficiency tips, so any advice would be appreciated.


The problem is that you need to do backprop, and the amount of memory needed isn’t negligible. Often, you need the whole activation to be stored, and this is what’s memory is spent on.

You could try to use torch.utils.checkpoint to trade compute for memory.


The input size of you model has a huge influence on the memory usage. Maybe try with a smaller input size.

Thank you all for the answers. I guess it is just something I cannot avoid.

@ptrblck Can I use torch.utils.checkpoint on the whole model? Is there an example to see if I understood the docs correctly?

I always refer to @Priya_Goyal’s tutorial.

1 Like

That’s great. Thanks!