Hi, we had a large RAM memory leak during gpu inference and narrowed down the culprit. Not critical for us as we found a quick workaround but I figured I would post this here in case it could help someone else or lead to a better fix.
Adding a sample snippet from our code:
with torch.no_grad():
for batch_num, data in enumerate(dataloader):
out = self.forward(data)[0]
store_out.append(out.detach().cpu().numpy())
# these would leak
# gt_batch = data['gt'].detach()
# gt_batch = data['gt'].detach().cpu().numpy()
# this doesn't leak
gt_batch = data['gt'].detach().cpu().numpy().copy()
store_gt.append(gt_batch)
- Model in eval mode and torch.no_grad
- iterating through dataloader, we store prediction and some loader data for each batch in a list (to be torch.cat after loop)
=>memory quickly overflowed to crash levels.
Accumulating predictions alone was fine (i.e. no data directly from dataloader).
Accumulating data from the dataloader caused leak.
We tried .detach() => leak
We tried .detach().cpu().numpy() => leak
we tried .detach().cpu().numpy().copy() => no leak, hurrah!
So something in the dataloader output seems to keep graph alive.
Hope that helps someone somewhere