HDF5 Read time difference between two different clusters

atakanokan · December 13, 2018, 5:14am

I have been experiencing really slow read times (when I call data.next() in my training loop) when I use Google Cloud Compute. However my school’s HPC is reading the same file 20x faster and I couldn’t find the reason behind this difference.

Any help is appreciated!

def __getitem__(self, idx):
        if self.f_open == False:
            f = h5py.File("../filename.hdf5", 'r')["delta_HI"]
            self.f_open = True

        f[c['x'][0]:c['x'][1],
              c['y'][0]:c['y'][1],
              c['z'][0]:c['z'][1]]

EDIT: The code snippet above doesn’t work. Had to change it to:

        f = h5py.File("../filename.hdf5", 'r')["delta_HI"]

        f[c['x'][0]:c['x'][1],
              c['y'][0]:c['y'][1],
              c['z'][0]:c['z'][1]]

albanD · December 13, 2018, 11:05am

Hi,

Isn’t the difference simply the read speed on the drives? If you have locall ssd on the schools hpc and remote hdd storage on GCC ?

atakanokan · December 14, 2018, 10:58pm

Turns out that is the exact reason! I configured a Google Cloud Compute instance with Local SSD and it is even faster than my school’s HPC.

Thanks @albanD!