How to keep the correspondence between 2 dataset when using shuffle in dataloader?

Shinered_Qiu · May 1, 2019, 8:02am

Hello, I am using two datasets to train my model. Because the correspondence between the two datasets(same length), so when I use dataloader shuffle, always need to keep the batches of two dataset s having same index. Can you give me some ideas?

Tony-Y · May 1, 2019, 8:33am

You can create a custom dataset composed of two datasets.

First read the following tutorial if you don’t know.
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#data-loading-and-processing-tutorial

For example:

class CustomDataset(Dataset):
    def __init__(self, dataset1, dataset2):
        self.dataset1 = dataset1
        self.dataset2 = dataset2

    def __len__(self):
        return len(self.dataset1)

    def __getitem__(self, idx):
        sample = {'data1': self.dataset1[idx], 'data2': self.dataset2[idx]}
        return sample

Shinered_Qiu · May 1, 2019, 8:49am

Thankyou so much. I will try it!