Question about GPU memory garbage collect

Problem

Summary: The non-referenced tensor object still takes GPU memory.

I monitor the GPU memory usage through nvidia-smi

Code:

class FCBlock(nn.Module):
    def __init__(self, in_channel, hidden_channel, out_channel, n_blocks):
        super().__init__()
        self.net = nn.Sequential()
        self.net.append(nn.Linear(in_channel, hidden_channel))
        self.net.append(ReLU())
        for _ in range(n_blocks):
            self.net.append(nn.Linear(hidden_channel, hidden_channel))
            self.net.append(ReLU())
        self.net.append(nn.Linear(hidden_channel, out_channel))
   
    
    def forward(self, x):
        return self.net(x)

def T_forward(surf, y, direction):
    F, g = F_forward(surf, y)
    g = g / g.norm(dim=-1, keepdim=True).clamp(min=1e-6)
    return y + g * _D(y) * direction

def F_forward(surf, x):
    x.requires_grad=True
    F = surf(x)
    Fx = torch.autograd.grad(F, [x], grad_outputs=torch.ones_like(F),
                             retain_graph=True, create_graph=True)[0]
    return F, Fx

def test_leak(surf):
    y = torch.randn((8192, 100, 3)).to(V().cfg.device)
    direction = torch.ones((8192, )).view(8192, 1, 1).to(V().cfg.device)
    T_forward(surf, y, direction) # consumes 20GB GPU memory, and stay unreleased
    T_forward(surf, y, direction) # consumes another 20GB GPU memory, and stay unreleased

if __name__ == "__main__":
     surf = FCBlock(3, 512, 1, 2)
     test_leak(surf)
     print('hi')
     while True: pass

Phenomenon:
(1) The first T_forward consumes 20GB GPU memory, and stay unreleased
(2) The second T_forward consumes another 20GB GPU memory, and stay unreleased
(3) Both GPU memory are not released even after the end of test_leak

I monitor the GPU memory usage through nvidia-smi

Analysis:
I remember that gc in python auto collect the object with zero reference. But it seems like gc isn’t working in this code, since after the end of test_leak, no tensor object is referenced and all gpu tensor should be released. But nvidia-smi tells me they’re not released.

Why?

Environment

torch 2.0.1+cu118
Ubuntu 22.04

In PyTorch when you move tensors to the GPU, the system has to allocate GPU memory for them. Since allocating/deallocating memory can be an expensive operation time-wise, when the tensors are no longer used PyTorch keeps this memory allocated as a form of cache in order to reuse it for future tensors without the overhead of allocating new memory. You can manually release this cache with torch.cuda.empty_cache(), at the cost of said overhead when allocating new tensors in the future.

1 Like