What is the elegant way to release GPU memory in a 'for' loop

I just run a very simple code, and meet an out of memory error. I think it is caused by the delayed memory release of the variable in a for loop.

# some initialization code
net.train()
for batch_idx, (data, label) in enumerate(trainloader):
    data = Variable(data.cuda())
    label = Variable(label.cuda())
    output = net(data)

# end of the script

On a 12GB graphic card Titan Xp, for the first iteration, 8GB GPU memory is used with batch size of 40. In the second iteration, there will be an out of memory error because memory of the first iter is not released considering that variable output is still in use.

I think one way to avoid the out of memory error is to decrease batchsize and make memory less than 6GB in one iteration but this way will halve the memory efficiency.
Another way is to explicitly del output as following:

# some initialization code
net.train()
for batch_idx, (data, label) in enumerate(trainloader):
    data = Variable(data.cuda())
    label = Variable(label.cuda())
    output = net(data)
    del output

# end of the script

Is using ‘del’ command a general way? Is there any propriate way to release gpu memory in a loop?
Or do I misunderstand the memory management of pytorch?

As you said, the script you have keeps the computation graph alive until the output = ... in the second iteration. In the usual case where a loss is backproped in each iteration, the graph is freed after calling backward, and thus OOM can’t happen.

The other way to do this is to wrap the iteration into a closure.

Either way, if you are not using the computation graph to do backward, you shouldn’t build it in the first place. So I would suggest using Variable(..., volatile=True) (or torch.no_grad() on master).

1 Like