How to speed up the data loader

James_Condon · February 22, 2019, 4:57am

This worked for me. Thanks. hdf “file opening has to happen inside of the __getitem__ function of the Dataset wrapper.” - https://stackoverflow.com/questions/46045512/h5py-hdf5-database-randomly-returning-nans-and-near-very-small-data-with-multi/52249344#52249344

nutszebra · March 27, 2019, 3:53pm

I wrote the code of prefetching and I confirmed that it improves the performance of data loader.
My code is based on the implementation here: https://github.com/NVIDIA/apex/blob/f5cd5ae937f168c763985f627bbf850648ea5f3f/examples/imagenet/main_amp.py#L256

However, if you run the program on your local machine, I highly recommend buying a NVMe drive (e.g., https://www.amazon.com/Samsung-950-PRO-Internal-MZ-V5P256BW-x/dp/B015SOI392). This investment completely solves the problem of slow image loading.

singleroc · June 17, 2019, 11:32am

so, the solution is employing the DALI and then change:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
to:
normalize = transforms.Normalize(mean=[0.485*255, 0.456*255, 0.406*255], std=[0.229*255, 0.224*255, 0.225*255])

?

Deepali · July 11, 2019, 4:05pm

Is DALI helpful in such cases?

JanuszL · July 12, 2019, 8:31am

If you have any questions or request feel free to drop them directly in https://github.com/NVIDIA/DALI.
Sorry, but we are not able to track all other forum threads about DALI, while we doing our best to be responsive on the GitHub.

vihari · September 3, 2020, 2:52am

A noticeable speedup with h5py would be seen only when h5 file is written without the chunked option.

rui_zhang_331 · November 21, 2020, 5:16am

Hi @Hou_Qiqi, I saw you had similar problem that want the dataloader to prefetch data while training ongoing, basically let GPU training and CPU dataloader run in parallel.

Here is our code

for fi, batch in enumerate(my_data_loader):
     train()

and in our dataloader, we have define some collate_fn to cook_data

                                 DataLoader(my_dataset,
                                        num_workers=config['num_dataloader_worker'],
                                        batch_size=config['dataloader_batch_size'],
                                        timeout=600,
                                        collate_fn=cook_data
                                        )

we observed it seems GPU needs to block waiting the dataloader to process, is there a way to prefetch as you mentioned? if we use a Map style dataset, not iterative, dose it work?

Anthony_Dave · June 11, 2022, 5:46am

I don’t recommend solution 1. Because .bmp is dramatically storage-cusuming (80 x origin image in my case). And can you explain more how to use solution 2?

Anish_Chhaparwal · July 25, 2022, 8:52pm

@Hou_Qiqi can you share snippet of how you logged runtime details of the dataloader?