Loading an HDF5 dataset with DataLoader

Hi,
I have two HDF5 datasets that has cat images and non cat images (64x64x3 [x209 train, x50 test]) for training and testing.
Each with a list of classes (0 for non cat, 1 for cat), a train_set_x → the images, and a train_set_y → the labels for the images.

I know I need to make a custom dataset with init, getitem, len, but what should be the value of those? and what should be the corresponding code to be able to load it?

I just want to load the data and be able to enumerate over it, shuffle it, transform it and such, just like we do with the MNIST dataset from pytorch datasets.

Im really lost! Thank you in advance!
bbba

One possible options is to have __init__ load both train_set_x and train_set_y into memory, __getitem__ can accept an index and return a tuple of image and label, then __len__ would be the size of the training data set.

You can consider other implementation (e.g. lazy loading or iter-style) if the dataset doesn’t fit into memory.

You may find our tutorial helpful as well.