If you are trying to sample data from multiple datasets, I would recommend to wrap all these unshuffled dataset in a custom Dataset
and shuffle this “wrapper” dataset:
class MyDataset(Dataset):
def __init__(self, datasetA, datasetB):
self.datasetA = datasetA
self.datasetB = datasetB
def __getitem__(self, index):
xA = self.datasetA[index]
xB = self.datasetB[index]
return xA, xB
def __len__(self):
return len(self.datasetA)
datasetA = ...
datasetB = ...
dataset = MyDataset(datasetA, datasetB)
loader = DataLoader(dataset, batch_size=10, shuffle=True)
This would make sure to shuffle the indices for MyDataset
, which would apply the same index to each internal dataset.