In my experience, I would first build an HDF5 file with all your images, which you can build easily following the documentation of h5py
on http://docs.h5py.org/en/latest/. During training, build a class inheriting from Dataset
which returns your images. Something along this line:
class dataset_h5(torch.utils.data.Dataset):
def __init__(self, in_file):
super(dataset_h5, self).__init__()
self.file = h5py.File(in_file, 'r')
self.n_images, self.nx, self.ny = self.file['images'].shape
def __getitem__(self, index):
input = self.file['images'][index,:,:]
return input.astype('float32')
def __len__(self):
return self.n_images
Then you can build your loader with:
self.train_loader = torch.utils.data.DataLoader(dataset_h5(train_file),
batch_size=16, shuffle=True)
My experience with this approach has been quite positive, with the GPU always at 100% even though I’m loading pretty heavy images.