GPU memory leak in a function during training

GPU memory leaks and keeps increasing until I get RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 31.75 GiB total capacity; 28.41 GiB already allocated; 4.00 MiB free; 30.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have a function to normalize pointsets. The leak is happening due to this function

def normalize_pointset(x):
    """x has shape B,3or4,N"""
    if x.shape[1] == 4:
        points = x[:,:3,:] / x[:,3,:].unsqueeze(1)+1e-7
    else:
        points = x
    del(x)
    points -= points.mean(dim=-1).unsqueeze(-1)   # center to origin
    points /= (points[:,0,:]**2+points[:,1,:]**2+points[:,2,:]**2).max(dim=1).values.view(-1,1,1)   # max length=1
    return points

Could you post a minimal executable code snippet showing the leak, i.e. an increase in memory which cannot be released and is lost, please?

I made a minimum working example but this one doesn’t leak memory as my original code does

import torch
from torch import nn, optim

def normalize_pointset(x):
    """x has shape B,3or4,N"""
    if x.shape[1] == 4:
        points = x[:,:3,:] / x[:,3,:].unsqueeze(1)+1e-7
    else:
        points = x
    del(x)
    points -= points.mean(dim=-1).unsqueeze(-1)   # center to origin
    points /= (points[:,0,:]**2+points[:,1,:]**2+points[:,2,:]**2).max(dim=1).values.view(-1,1,1)   # max length=1
    return points

class MyLinear(nn.Module):
  def __init__(self):
    super().__init__()
    self.id = nn.Identity()
    self.b = nn.Parameter(torch.randn(24,3,122880))

  def forward(self, input):
    return self.id(input) + self.b

model = MyLinear().cuda()
optimizer = optim.Adam(model.parameters(), 1e-4)
for i in range(10000):
    data1 = torch.randn(24,4,122880).cuda()
    data2 = torch.randn(24,4,122880).cuda()

    optimizer.zero_grad()
    loss = ((model(normalize_pointset(data1)) - normalize_pointset(data2))**2).mean()
    loss.backward()
    optimizer.step()

I run my code without the normalize_pointset(x) function and it works fine (without any memory leak).

OK, let me know once you are able to narrow provide a code snippet which shows the leak and ping me, please.