Increase the CUDA memory twice then stop increasing


I have the code below and I don’t understand why the memory increase twice then stops

I searched the forum and can not find answer

env: PyTorch 0.4.1, Ubuntu16.04, Python 2.7, CUDA 8.0/9.0

from torchvision.models import vgg16
import torch
import pdb

net = vgg16().cuda()
data1 = torch.rand(16,3,224,224).cuda()

for i in range(10):
    out1 = net(data1)
  1. first stop, this is what data1 and vgg16 take
  2. second stop, this is what the intermediate status of vgg16 take
  3. third stop, WHY it increase again?
  4. forth stop, WHY it stops increasing?

Out of memory error during evaluation but training works fine!
Spliting my model on 4 GPU and cuda out of memory problem
(colesbury) #2

The memory is from the output out1 and intermediate activations needed to compute the gradient. The first increase is from computing out1. The second increase is from computing net(data1) while out1 is still alive. The reason is that in:

out1 = net(data1)

The right-hand side net(data1) is evaluated before the assignment. Memory usage, as reported by the system, doesn’t generally decrease. If it had, then it would decrease back to 2872Mi after the assignment operation.

You can rewrite your program to avoid keeping two versions of out1 alive at once:

def eval(network, input):
  out1 = network(input)
  # maybe use out1 here

for i in range(10):
  eval(net, data1)

As long as you don’t return out1 from eval, out1 will be freed before the next call, so you’ll only use 2872Mi.

Optimizer step requires GPU memory

But why it did not increase for the third time?

Is there some optimization in the compiler of PyTorch, e.g., doing the first Backprop and the second Forward at the same time? this is the only reason I could think of why there are the two copies

(colesbury) #4

The old value in out1 gets deleted since there are no longer any references to it. It holds onto all the internal state needed to compute the gradient, so when it gets deleted that internal state gets deleted too.