CUDA memory continuously increases when net(images) called in every iteration

Hi, I have a very strange error, whereby, when I get by outputs = net(images) within every iteration in a for loop, the CUDA memory usage keeps on increasing, until the GPU runs out of memory.

The weird situation is that if I have this loop inside a function, it causes this issue. If I just have the contents of the function sitting in my normal script, it works just fine. What may be the cause of this??


This is because pytorch will build a the graph again and again, and all the intermediate states will be stored.

In training, the states will be cleared if you do backward.
However, during test time, you could use Variable(xxx, volatile=True). I don’t know if it’s your case, because of the weird situation you have.

A good reference is this Understanding graphs and state.


It’s hard to guess what’s happening, you’re probably holding on to the output, or to the loss for too long. If you’re accumulating the losses over multiple batches, use[0].


I think that did it! Ugh! I spent two days on this! >< Thanks though… so what was going on here?

@ruotianluo Thanks - if I understand you correctly, are you saying that, since I am accumulating a loss, that is composed of:

for i in xrange(100):
    out = net(input) 
    loss = someLossFunction(out)

, AND, since the loss here is a Torch Variable, that this will cause the graph to be built over and over again? Have I understood you correctly?

The graph won’t be re-made. The 100 loops basically create 100 graphs. The 100 graphs share the same input and parameters, but all the intermediate variables of 100 graphs (although they could be the same) are saved separately.

1 Like

Does this mean that simple having Variable called in a loop will cause the graph to be re-made? This is the part that I am not getting… thanks.

1 Like
for i in xrange(100):
    out = net(input)
    loss += someLossFunction(input) # BAD, because it keeps continuing the graph over the for-loop

    loss = someLossFunction(input)   # this is fine

    loss = someLossFunction(input)
    total_loss +=[0]              # this is fine

Thanks @smth, I think I get it now, since the loss is a variable, it will keep making the graph longer and longer, connecting an entire (?) graph over and over again… I think.

Im just asking for the sake of understanding: Is what is happening, let me put it differently: Are you saying that this statement here, will make two graphs that are identical to each other?

loss = someLossFunction(input1) + someLossFunction(input2)

Is my conclusion correct?

1 Like

yes, your conclusion is correct.

1 Like

To clarify for RNN users that might come across this, we do actually want to keep the graph around for backprop-through-time. (Right? Or is there a better way?)

we do want to keep the graph around for BPTT, but you have to call .detach() every BPTT steps, so that the graph doesn’t keep growing infinitely.

This should be written in the tutorial!


@smth, could you elaborate on the .detach() usage?

hi, can the total_loss here be back propagated?


this solved my problem

This solved my problem too. I didn’t know call loss instead of[0] will cause rewriting the graph.


This is really important. Everyone learning PyTorch should know this at the first place.

Hello, everyone.
Beside this problem, what else could increasing the GPU memory (leak GPU memory)?
Because I have fixed my loss function code, but it still not worked.
Albert Christianto