Dataloader shuffle same order with multiple dataset

If you are trying to sample data from multiple datasets, I would recommend to wrap all these unshuffled dataset in a custom Dataset and shuffle this “wrapper” dataset:

class MyDataset(Dataset):
    def __init__(self, datasetA, datasetB):
        self.datasetA = datasetA
        self.datasetB = datasetB
        
    def __getitem__(self, index):
        xA = self.datasetA[index]
        xB = self.datasetB[index]
        return xA, xB
    
    def __len__(self):
        return len(self.datasetA)
    
datasetA = ...
datasetB = ...
dataset = MyDataset(datasetA, datasetB)
loader = DataLoader(dataset, batch_size=10, shuffle=True)

This would make sure to shuffle the indices for MyDataset, which would apply the same index to each internal dataset.

1 Like