Hdf5 a data format for pytorch

Carsten_Ditzel · March 20, 2019, 12:21pm

thanks for joining in Piotr =)

in my experience, using a large hdf5 file with several large data sets inside, the GPU idles for most of the time until its utilization suddenly peaks up to ~80% before dropping below 5% again. This pattern repeats, as such, I cannot confirm your statement

There is no overhead from opening the hdf5 file and loading data is successfully covered with GPU execution . DataLoader’s __next__ operation (getting next batch) in main process takes below 1% of the profiling time and we have full utilisation of GTX1060!

I have 8 CPU cores and a Ti1080 with 11 GB, may I ask how many workers you use in the dataloading process?

Also: Have you experienced problems where data is corrupted/mixed up if read with multiple workers from hdf5 as described e.g. here?