Memory efficient HDF5 dataset loading

I am trying to train a model with a dataset containinng ~11 million samples of 1d vectors contained in an HDF5 format file. It all runs ok with smaller datasets, but when I try with the full dataset, I find the training just hangs and I cannot get through even a single epoch after several hours of running on a GPU. I am wondering if I am exceeding system memory even though I think I am doing lazy loading. The code I am using is similar to below:

class ConcatDataset(torch.utils.data.Dataset):
    def __init__(self, xdata, ydata):
        self.xdatasets = xdata
        self.ydatasets = ydata

    def __getitem__(self, i):
        xd = [torch.tensor(d[i]) for d in self.xdatasets]
        x = torch.cat(xd)
        y = torch.cat([torch.tensor(d[i]) for d in self.ydatasets])
        return (x.to(device),y.to(device))

    def __len__(self):
        return min(len(d) for d in self.xdatasets)

train_loader = torch.utils.data.DataLoader(
             ConcatDataset(x,y),
             batch_size=args.batch_size, shuffle=True)

I am after some advice that I am not missing something obvious above. I am trying with a batch size of 100.