How to speed up the data loader

This worked for me. Thanks. hdf “file opening has to happen inside of the __getitem__ function of the Dataset wrapper.” - https://stackoverflow.com/questions/46045512/h5py-hdf5-database-randomly-returning-nans-and-near-very-small-data-with-multi/52249344#52249344

1 Like

I wrote the code of prefetching and I confirmed that it improves the performance of data loader.
My code is based on the implementation here: https://github.com/NVIDIA/apex/blob/f5cd5ae937f168c763985f627bbf850648ea5f3f/examples/imagenet/main_amp.py#L256

However, if you run the program on your local machine, I highly recommend buying a NVMe drive (e.g., https://www.amazon.com/Samsung-950-PRO-Internal-MZ-V5P256BW-x/dp/B015SOI392). This investment completely solves the problem of slow image loading.

1 Like

so, the solution is employing the DALI and then change:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
to:
normalize = transforms.Normalize(mean=[0.485*255, 0.456*255, 0.406*255], std=[0.229*255, 0.224*255, 0.225*255])

?

Is DALI helpful in such cases?

If you have any questions or request feel free to drop them directly in https://github.com/NVIDIA/DALI.
Sorry, but we are not able to track all other forum threads about DALI, while we doing our best to be responsive on the GitHub.

A noticeable speedup with h5py would be seen only when h5 file is written without the chunked option.

1 Like

Hi @Hou_Qiqi, I saw you had similar problem that want the dataloader to prefetch data while training ongoing, basically let GPU training and CPU dataloader run in parallel.

Here is our code

for fi, batch in enumerate(my_data_loader):
     train()

and in our dataloader, we have define some collate_fn to cook_data

                                 DataLoader(my_dataset,
                                        num_workers=config['num_dataloader_worker'],
                                        batch_size=config['dataloader_batch_size'],
                                        timeout=600,
                                        collate_fn=cook_data
                                        )

we observed it seems GPU needs to block waiting the dataloader to process, is there a way to prefetch as you mentioned? if we use a Map style dataset, not iterative, dose it work?

I don’t recommend solution 1. Because .bmp is dramatically storage-cusuming (80 x origin image in my case). And can you explain more how to use solution 2?

@Hou_Qiqi can you share snippet of how you logged runtime details of the dataloader?