Dataloader pin_memory dose not help speedup

Hi everyone. I am using Dataloader to read hdf5 files and I am using HDF5 1.10+. When I set the num_workers = 8 and pin_memory=True, I can get 70 samples per second which is only as fast as one main process (num_workers = 0).
However, if I set num_workers = 8 and pin_memory=False, I can get 110 samples per second.
I don not know why pin_memory=True did not help speedup. I am only evaluating the data generated code excluding transferring to gpu.

My Dataloader code follows https://discuss.pytorch.org/t/dataloader-when-num-worker-0-there-is-bug/25643/16and it looks like:

class H5Dataset(torch.utils.data.Dataset):
    def __init__(self, path):
        self.file_path = path
        self.dataset = None
        with h5py.File(self.file_path, 'r') as file:
            self.dataset_len = len(file["dataset"])

    def __getitem__(self, index):
        if self.dataset is None:
            self.dataset = h5py.File(self.file_path, 'r')["dataset"]
        return self.dataset[index]

    def __len__(self):
        return self.dataset_len