I would like to implement something equivalent to the tf.data.Dataset
joint features: cache and shuffle.
More precisely, I would like to create a code, to speed up the training, equivalent to:
inp, gt = train_generator[0]
train_dataset = tf.data.Dataset.from_generator(train_generator,
output_shapes=(inp.shape, gt.shape),
output_types=(inp.dtype, gt.dtype)).batch(batch_size)
train_dataset = train_dataset.prefetch(4)
train_dataset = train_dataset.cache()
train_dataset = train_dataset.shuffle(len(train_generator) + 1, reshuffle_each_iteration=True)
I have already found some partial answers such as Best practice to cache the entire dataset during first epoch but I think none of them supports shuffle.
My starting point is my dataset class:
class MatDataset (Dataset):
def __init__(self, img_path_list):
self.img_path_list = img_path_list
def __len__(self):
return len(self.img_path_list)
def __getitem__(self, index):
temp = io.loadmat(self.img_path_list[index])
I_PAN = temp['I_PAN']
I_MS = temp['I_MS']
# I_MS = interp23tap(I_MS, 4)
I_MS = np.moveaxis(I_MS, -1, 0)
I_PAN = np.expand_dims(I_PAN, 0)
return torch.from_numpy(I_MS.astype(np.float32)), torch.from_numpy(I_PAN.astype(np.float32))
How may I do?
Thanks a lot!