Memory leak (no_grad) when accumulating dataloader outputs

PoseAI · March 24, 2023, 12:58pm

Hi, we had a large RAM memory leak during gpu inference and narrowed down the culprit. Not critical for us as we found a quick workaround but I figured I would post this here in case it could help someone else or lead to a better fix.

Adding a sample snippet from our code:

    with torch.no_grad():
        for batch_num, data in enumerate(dataloader):
            out = self.forward(data)[0]
            store_out.append(out.detach().cpu().numpy())

            # these would leak
            # gt_batch = data['gt'].detach()
            # gt_batch = data['gt'].detach().cpu().numpy()

            # this doesn't leak
            gt_batch = data['gt'].detach().cpu().numpy().copy()

            store_gt.append(gt_batch)

Model in eval mode and torch.no_grad
iterating through dataloader, we store prediction and some loader data for each batch in a list (to be torch.cat after loop)
=>memory quickly overflowed to crash levels.
Accumulating predictions alone was fine (i.e. no data directly from dataloader).
Accumulating data from the dataloader caused leak.
We tried .detach() => leak
We tried .detach().cpu().numpy() => leak
we tried .detach().cpu().numpy().copy() => no leak, hurrah!

So something in the dataloader output seems to keep graph alive.
Hope that helps someone somewhere