Manage GPU memory efficiently

I am working on a model now and it takes 8GB momory on forward pass (It uses onling MaxPool3d and Conv3d layers). So if I use call backward function, the memory explode to 16 gigs right? Any tips on how to reduce memory usage (apart from reducing memory)?
I can make reduce_grad = false for inputs maybe but thats very less significant, are there some other tricks?

Are you running out of memory or is it just a theoretical question?

What do you mean by reduce_grad for inputs?

You could have a look at torch.utils.checkpoint to trade compute for memory or just lower your batch size, if that’s possible.


I have yet to do build the loss function, I only did forward pass yet and was afraid that backward wont work (run out of memory), because storing gradients would take as much space as storing the weights so it would double up.
Can’t reduce batchsize, I am using 3D CNNs for videos and 8 GB is a single video (65 frames) i.e. a single batch element.

And sorry I meant require_grad = False, not reduce_grad.

And are you suggesting it may fit into 12 gigs for 8GB forward pass (will definitely try but wanted to know how things work inside).
And will look at torch.utils.checkpoint, Thanks!

Ah ok, I see.

The input is usually always defined with require_grad=False, so you won’t save any memory.
You are most likely right, that it will be really close to the limit. Maybe using checkpoint it’ll be possible.
Let me know, if you’ve tried it out.

I’ve edited the previous post, since I think you were right about the forward/backward memory usage.
Sorry for the confusion!

Yeah, will probably test by weekend and tell :slight_smile:

@ptrblck I performed the tests.
Surprisingly it doesn’t double up the memory, infact it a model that used 8.3 GB on forward pass took less than 9 GB on backward pass. It might have allocated space for gradients during forward pass only.

Also just now noticed that Variables has been deprecated!!! Will have to update my code