CUDA out of memory error for training model with HDF5 dataset

Hi I am trying to train my model with HDF5 dataset. I pass this dataset to torch.utils.data.Dataset and then use DataLoader to feed samples to my model. The model works fine if I dont use HDF5 but that’s extremely slow so I modified my training pipeling to acquire samples from HDF5. Using HDF5, I face another issue of CUDA out of memory. Note that I am using 2 GPUs to train my model, the total batch size is 8 (thats divided between 2 GPUs) in training via nn.DataParallel. I have tried making the number of workers as zero to see if that helps but that does not help. Also, I have made two HDF5 files (one for images, other for labels). My application is image segmentation. I do not understand if the model works fine without HDF5 then why does it shows error for HDF5? I also checked with reading HDF5 files in get_item in torch.utils.data.Dataset to see if the error is because of opening files before get_item() but that does not help as well.
Out of memory error comes after three training steps.