Memory leak (no_grad) when accumulating dataloader outputs

Hi, we had a large RAM memory leak during gpu inference and narrowed down the culprit. Not critical for us as we found a quick workaround but I figured I would post this here in case it could help someone else or lead to a better fix.

Adding a sample snippet from our code:

    with torch.no_grad():
        for batch_num, data in enumerate(dataloader):
            out = self.forward(data)[0]
            store_out.append(out.detach().cpu().numpy())

            # these would leak
            # gt_batch = data['gt'].detach()
            # gt_batch = data['gt'].detach().cpu().numpy()

            # this doesn't leak
            gt_batch = data['gt'].detach().cpu().numpy().copy()

            store_gt.append(gt_batch)
  • Model in eval mode and torch.no_grad
  • iterating through dataloader, we store prediction and some loader data for each batch in a list (to be torch.cat after loop)
    =>memory quickly overflowed to crash levels.
    Accumulating predictions alone was fine (i.e. no data directly from dataloader).
    Accumulating data from the dataloader caused leak.
    We tried .detach() => leak
    We tried .detach().cpu().numpy() => leak
    we tried .detach().cpu().numpy().copy() => no leak, hurrah!

So something in the dataloader output seems to keep graph alive.
Hope that helps someone somewhere

1 Like