Hi, I am experiencing a RAM (not VRAM) memory leak issue while using next(iter(dataloader))
on a short lived dataloader.
I am training a model in the context of Meta-Learning Few-Shot Classification and I have to randomly sample different classes AND images of those classes for each step of my training loop.
I am therefore creating a new Dataloader for each training step which only contains a single batch.
I already saw this thread that addresses a similar problem, but I think it does not apply to mine because I have to create a new DataLoader per training step: Get a single batch from DataLoader without iterating · Issue #1917 · pytorch/pytorch · GitHub
I do the following per training step:
# Create my task (collection of randomly sampled classes)
task = task_type(meta_train_classes, ...)
# Fetch DataLoader for that task
dataloader = fetch_dataloaders('train', task)
...
# Iterate dataloader (causes memory leak)
X_sup, Y_sup = next(iter(dataloader))
I also use numpy arrays in my custom Dataset class (and from memory profilings, memory does not seem to leak from __getitem__()
).
I am wondering if I could be doing this in any other way to prevent this memory leak. I have to train my model for a huge number of steps and my 64GB of RAM fills ups slowly, but fast enough that I can’t fully train the model.