Memory when storing states in a list

Knievel · December 4, 2019, 2:20pm

From this thread i learned that I always store the whole computational graph when I append a tensor to a list.

To fix this I could detach the tensor before storing it, however in my case I need them for gradient computation. For each pixel of an image I have to store one state tensor, as soon as all state tensors are computed I can compute the output. If I use more layers I need to store n times the amount of tensors.

Currently I use 64x84 images and with two layers of 256 units I already run out of memory on my universities cluster.

Is there some way of reducing the storage needed or am I just not able to use larger numbers with my architecture?

albanD · December 4, 2019, 4:09pm

If your want to backprop through all these, then you need to have them.

An alternative is to trade memory for compute using the torch.utils.checkpoint module.