Is it possible to Free-up a Dataloader?

Paritosh · July 21, 2018, 9:49pm

I am using different Dataloaders for train set and test set, so in all, I have 2 Dataloaders.

I do training and testing in every epoch.

Is there a way I can free up the Dataloader not being used (for eg. free-up the train dataloader while testing; and free-up test dataloader while training) so as to be able to increase the batch size of the Dataloader being used?

Deeply · July 22, 2018, 3:21pm

From my understanding, the dataloader is just a proxy between your train/test set, and these train and test sets are the variables that eat the memory. So, you will need to delete the train/test set to free up the allocated memory. I could be wrong, but I guess this is how it could be done:

test_loader = None
del test_loader
test_set  = None
del test_set

However, you will have to read the deleted (data)set and the associated loader before trying to use it again. More, you will have to compromise the reading time(s) of these sets with the gain you might get from deleting them.

chenglu · July 22, 2018, 4:03pm

I think dataloader is just a generator. What takes memory is model weights and the data been yield. The memory of model weights will always be there since you are training them all the time. And about the yield data from dataloader, at end of every iteration, the yield data will be free since it’s a local variable. So I think there is no need to free dataloader.

Paritosh · July 22, 2018, 9:11pm

Hi @Deeply, thanks for the suggestion. I tried this but it is not working. GPU runs out of memory.

Paritosh · July 22, 2018, 9:15pm

Hi @chenglu, thanks for the answer. I think that training and testing batch size matters. I tried out with larger test batch size and it didn’t work, but when I reduce the test batch size, it works. So it seems that even if the dataloader is emptied out at the end of every epoch, it occupies memory.

ptrblck · July 22, 2018, 9:19pm

Have you used with torch.no_grad() during your test phase? This will avoid storing the intermediate variables needed for the backward pass, which is not necessary for testing.

Paritosh · July 22, 2018, 10:25pm

Hi @ptrblck, thanks a lot, I am able to increase the batch size after using with torch.no_grad()!

aksg87 · September 20, 2021, 5:21pm

Hi all,

I am running into what I think is a memory leak? Not sure why my memory isn’t getting feed up while I iterate through my dataloader during inference. If anyone has ideas please let me know!

**More Details **

I noticed that the memory appears to accumulate based on the tracemalloc package at:

def __getitem__(self, index)

...
        image = (image / 255).astype(np.float32)
        mask = mask.astype(np.float32)
...

Iteration through inference DataLoader

...
    tracemalloc.start()

    snapshot1 = tracemalloc.take_snapshot()
    for (batch_idx, batch) in tqdm(enumerate(loader), total=len(loader)):
        try:
            if batch_idx % 10 == 0:
                snapshot1 = tracemalloc.take_snapshot()

            log.info(f"Running inference on {batch['image_path']}")

            batch_pred = inference_step(
                model, checkpoint_fp, batch, thresh, model_loaded=True
            )

            res = inference_ds.save_prediction_batch(
                batch_pred,
                out_folder,
                config=cfg,
                thresh=thresh,
                channel=channel,
            )
            successes.append(res)
            log.info("Inference complete")

            if batch_idx % 10 == 9:
                snapshot2 = tracemalloc.take_snapshot()
                top_stats = snapshot2.compare_to(snapshot1, "lineno")

                log.info("\n\n [ Top 10 differences ]")
                for stat in top_stats[:10]:
                    log.info(f"Snapshot difference: \n {stat}")
...

ptrblck · September 20, 2021, 7:12pm

Are you storing tensors attached to the computation graph in e.g. a list?
This would not be a leak, but the increased memory usage would be expected in this case (although this behavior is commonly referred to as a “memory leak”).
Try to narrow down which part of your code is causing the increase in memory usage as e.g. save_prediction_batch and successes.append(res) might store the entire computation graph.

aksg87 · September 21, 2021, 6:57am

I figured out the issue @ptrblck !

I was using this package GitHub - msamogh/nonechucks: Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!

With this issue:

github.com/msamogh/nonechucks

Memory leak

opened 01:20PM - 07 Jun 19 UTC

shrubb

Hi, when the below example is run, the RAM usage grows forever: ```python3… import torch, torch.utils.data import nonechucks class DummyDataset(torch.utils.data.Dataset): def __len__(self): return 1_000_000 def __getitem__(self, idx): return 666 dataset = nonechucks.SafeDataset(DummyDataset()) for _ in torch.utils.data.DataLoader(dataset): pass ``` Notes: * Here the increase is quite slow; for a RAPID bug demonstration, replace `666` with `torch.empty(10_000)` (be careful to kill the process in time, before you're OOM!). * No problems without `SafeDataset`. * Without `torch.utils.data.DataLoader`, the leak is still there, although at a smaller scale, around 1 MB of RAM is lost per 30000-40000 `__getitem__` calls. * PyTorch 1.0.1, nonechucks 0.3.1.

Subtle bug to track down