Pytorch Dataloader Memory Leak

nmamie · August 30, 2023, 10:07am

Hi, I noticed that while training a PyTorch model the subprocesses that are started by the dataloader workers are accumulating memory over time while loading new batches and it seems this memory is never released, ultimately resulting in a “dataloader worker does not have sufficient shared memory” error. Could this be a memory leak or is this a known “bug”.
The PyTorch version i am using is 2.0 for CUDA 11.7 in case this information helps.

Thanks a lot!

ptrblck · August 30, 2023, 9:59pm

Could you check if you are running into this issue?

nmamie · August 31, 2023, 6:46am

Yes I already checked this issue, unfortunately this didn’t help. What I notice is that even if I set num_workers = 0 the shared memory of the process that runs the script keeps increasing until the shared memory error appears.

ptrblck · August 31, 2023, 1:47pm

Shared memory shouldn’t be used of no multiprocessing is needed in the DataLoaders. Are you manually sharing tensors somewhere in your code?

nlgranger · November 7, 2023, 3:03pm

I am also observing a shared memory leak in the parent process. It keeps growing way beyond the sum of shm in the dataloader workers.
Capture d’écran du 2023-11-07 16-02-24

I also verified that it is not related to the dataloader.

nlgranger · November 7, 2023, 3:58pm

Ok there seems to be a bug when runnning a whole script under a torch.inference_mode() context. Using inference mode only on model calls works fine and the leak is gone. I’ll try to make a minimal reproducible example.

Winston_M · March 14, 2024, 11:36pm

I think I am encountering the same problem. my eval() function leaks memory

The eval() function is wrapped with torch.no_grad() and the model is set to model.eval()

Is this the same issue? Could you please point to my problem?

nlgranger · March 16, 2024, 7:54am

Hi! In my case, the problem was that I was saving my targets straight from the dataloader and thus with tensors stored in shared memory.

If you move that data to the GPU or unshare it should solve the issue.

all_predictions = []
all_targets = []
for batch_in, batch_tgt in val_loader:
    batch_in = batch_in.to(device, non_blocking=True)
    bath_tgt = batch_tgt.clone()  # <<<--- do this

    with torch.inference_mode():
        batch_prediction = model(batch_in)

    all_predictions.extend(batch_prediction)
    all_targets.extend(bath_tgt)