Potential Memory Leakage

Hello, I am facing the potential GPU memory leakage problem, I made some simple test to measure the memory:

def train(data):
    for batch in train_loader:
        print("before: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
        batch = batch.to(device)
        out = model(batch.x)
        loss = F.nll_loss(out, batch.y)
        print("middle: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
        print("after: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
    return float(loss)

And in my model.forward, I also set a memory measurement just before return:

def forward(self, x):
      print("allocated: %.2f MB" % (torch.cuda.memory_allocated() / 1024 / 1024), flush=True)
      return out

Then, the output seems weird, at the first several batches:

before: 0.00 MB
allocated: 918.37 MB
middle: 900.69 MB
after: 46.32 MB
before: 46.32 MB
allocated: 1496.60 MB
middle: 1468.49 MB
after: 109.55 MB
before: 109.55 MB
allocated: 571.27 MB
middle: 562.04 MB
after: 129.10 MB

but after several batches, the value increases and is larger than it should be and becomes stable:

before: 7077.44 MB
allocated: 7951.10 MB
middle: 7933.39 MB
after: 7077.51 MB
before: 7077.51 MB
allocated: 8597.35 MB
middle: 8566.14 MB
after: 7077.21 MB
before: 7077.21 MB
allocated: 8363.79 MB
middle: 8337.63 MB
after: 7077.20 MB

Does memory leakage happen in my code? The gap between allocated and middle in my code is small, is this normal? Any help would be appreciated.

Are you storing any tensors which might still be attached to the computation graph outside of the train method?
I would assume float(loss) would detach the tensor, so unsure what might be causing it.
In case you get stuck, could you try to post a minimal and executable code snippet to reproduce the issue, please?

Thanks for your reply and sorry for the late response. I have found the problem. When I used the dataloader to get mini-batches, Previously I used like:

For batch in train_loader:
       batch = batch.to(device)

then I switched to :

For batch in train_loader:
       x = batch.x.to(device)
       y = batch.y.to(device)

It works. I found the second way can release the memory, while the first one cannot. I am not sure the actual reason for that.

That’s interesting to see. What is your DataLoader returning? It seems be be an object containing internal .x and .y attributes which also provides the interface to the to() operation so it cannot be a plain list.

Correct, it’s a Data object containing each batch’s properties, like .x and .y etc., it makes me confused why it will cause the problem.