This worked for me. Thanks. hdf “file opening has to happen inside of the __getitem__
function of the Dataset
wrapper.” - https://stackoverflow.com/questions/46045512/h5py-hdf5-database-randomly-returning-nans-and-near-very-small-data-with-multi/52249344#52249344
I wrote the code of prefetching and I confirmed that it improves the performance of data loader.
My code is based on the implementation here: https://github.com/NVIDIA/apex/blob/f5cd5ae937f168c763985f627bbf850648ea5f3f/examples/imagenet/main_amp.py#L256
However, if you run the program on your local machine, I highly recommend buying a NVMe drive (e.g., https://www.amazon.com/Samsung-950-PRO-Internal-MZ-V5P256BW-x/dp/B015SOI392). This investment completely solves the problem of slow image loading.
so, the solution is employing the DALI and then change:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
to:
normalize = transforms.Normalize(mean=[0.485*255, 0.456*255, 0.406*255], std=[0.229*255, 0.224*255, 0.225*255])
?
Is DALI helpful in such cases?
If you have any questions or request feel free to drop them directly in https://github.com/NVIDIA/DALI.
Sorry, but we are not able to track all other forum threads about DALI, while we doing our best to be responsive on the GitHub.
A noticeable speedup with h5py would be seen only when h5 file is written without the chunked option.
Hi @Hou_Qiqi, I saw you had similar problem that want the dataloader to prefetch data while training ongoing, basically let GPU training and CPU dataloader run in parallel.
Here is our code
for fi, batch in enumerate(my_data_loader):
train()
and in our dataloader, we have define some collate_fn
to cook_data
DataLoader(my_dataset,
num_workers=config['num_dataloader_worker'],
batch_size=config['dataloader_batch_size'],
timeout=600,
collate_fn=cook_data
)
we observed it seems GPU needs to block waiting the dataloader to process, is there a way to prefetch as you mentioned? if we use a Map style dataset, not iterative, dose it work?
I don’t recommend solution 1. Because .bmp is dramatically storage-cusuming (80 x origin image in my case). And can you explain more how to use solution 2?
@Hou_Qiqi can you share snippet of how you logged runtime details of the dataloader?