Question about memory usage of math operation

fanl · November 18, 2019, 11:13am

Hi,
In my case, I calculate loss as following code:

loss = 0
for A,B,C in sequence_of_tensors:
    loss += ((A - B)**2 * k1 * (A > C).float() +
             (A - B)**2 * k2 * (A < C).float() +
             (A - B)**2 * k3 * (A == C).float()).sum()

A and B are tensors. When I put all tensors on GPU, memory usage is about 5GB. But when the computation begins, I occur CUDA out of memory error. My GPU has 12GB memory, which I think is enough. The intermediate results of computation should not be larger than original tensors.
I can’t understand why such computation is so memory expensive. I’ll appreciate any help.

JuanFMontesinos · November 18, 2019, 1:21pm

I’d say it depends on the length of sequence_of_tensors.
are all the tensors in sequence of tensors the same length? You can try calculate that batch-wise rather than list-wise.
I would also store A-B as a variable, since you are performing that op 3 times I think it will create 3 graph nodes.
So, in short, I would try to do something like:

A,B,C =stacked_sequence_of_tensors:
tmp = (A-B)**2
loss = (tmp * k1 * (A > C).float() +
             tmp * k2 * (A < C).float() +
             tmp * k3 * (A == C).float()).sum()

fanl · November 18, 2019, 1:53pm

Thanks! Using tmp alleviates the problem. I am surprised by the memory usage in calculation. Just the first execution of loop will lead to out of memory. It might be much more complicated than just storing intermediate results.

JuanFMontesinos · November 18, 2019, 2:12pm

I think it is also related to the for loop you are using. I would try to stack tensors inside if possible. That will save lot of memory.