Small model size but high GPU memory usage

bornabesic · January 19, 2019, 9:19am

Hi,

I am doing some GAN research and I am running into a problem with memory efficiency.
My both networks together have a size of around 50 MB. However, training with batches of size 3 already uses all of my GPU memory, i.e. 12 GB. The input consists of 512 x 512 images concatenated with some binary masks. I am aware that autograd needs to keep track of additional things for the backward pass, but it just seems a little bit too much for me.

I couldn’t find any useful memory efficiency tips, so any advice would be appreciated.

Thanks!

Konstantin_Solomatov · January 20, 2019, 4:08am

The problem is that you need to do backprop, and the amount of memory needed isn’t negligible. Often, you need the whole activation to be stored, and this is what’s memory is spent on.

ptrblck · January 20, 2019, 4:30am

You could try to use torch.utils.checkpoint to trade compute for memory.

chenglu · January 20, 2019, 5:06am

The input size of you model has a huge influence on the memory usage. Maybe try with a smaller input size.

bornabesic · January 28, 2019, 12:12am

Thank you all for the answers. I guess it is just something I cannot avoid.

@ptrblck Can I use torch.utils.checkpoint on the whole model? Is there an example to see if I understood the docs correctly?

ptrblck · January 28, 2019, 3:32am

I always refer to @Priya_Goyal’s tutorial.

bornabesic · January 29, 2019, 12:43pm

That’s great. Thanks!