General method of retrieval of h5py files as training data

I’m quite new to PyTorch and am a bit unsure if my method of storage and retrieval of training data is efficient. To clarify, my code works as expected and runs without errors, but slowly.

I’m using h5py (which I am also new at) and tried to model my functions after the suggestions in the forum post: DataLoader, when num_worker >0, there is bug

My training data has the size X = (400000,1000) and y = (400000,3) which is saved as a hdf5 file using:

with h5py.File(fileName, 'w') as f:
        for i in range(X.shape[0]):
            f.create_dataset('%s/data_X' % i,data = X[i])
            f.create_dataset('%s/data_y' % i,data = y[i])

Then later when I want to train my network my retrieval method looks like:

class H5Dataset(Data.Dataset):
    def __init__(self, h5_path):
        self.h5_path = h5_path
        self._h5_gen = None

    def __getitem__(self, index):
        if self._h5_gen is None:
            self._h5_gen = self._get_generator()
            next(self._h5_gen)
        return self._h5_gen.send(index)

    def _get_generator(self):
        with h5py.File( self.h5_path, 'r') as record:
            index = yield
            while True:
                X = record[str(index)]['data_X'][()]
                y = record[str(index)]['data_y'][()]
                index = yield X, y

    def __len__(self):
        with h5py.File(self.h5_path,'r') as record:
            return len(record)

BATCH_SIZE = 400
loader = Data.DataLoader(
        dataset=H5Dataset(fileName), 
        batch_size=BATCH_SIZE, 
        shuffle=True, num_workers=0)

for i, (X_batch, y_batch) in enumerate(loader):
    # Training occurs here

Running through one full iteration of the loop (without training) takes about 4 min, which to my untrained eye seems like it could be more efficient. If so what could I do to improve?