I have two parallel datasets dataset1
and dataset2
and following is my code to load them in parallel using SubsetRandomSampler
where I provide train_indices
for dataloading.
P.S. Even after setting num_workers=0
and seeing np.random
, the samples do not get loaded in parallel. Any suggestions are heartily welcome including methods other than SubsetRandomSampler
.
import torch, numpy as np
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
dataset1 = torch.tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
dataset2 = torch.tensor([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
train_indices = list(range(len(dataset1)))
np.random.seed(12)
np.random.shuffle(train_indices)
sampler = SubsetRandomSampler(train_indices)
dataloader1 = DataLoader(dataset1, batch_size=2, num_workers=0, sampler=sampler)
dataloader2 = DataLoader(dataset2, batch_size=2, num_workers=0, sampler=sampler)
for i, (data1, data2) in enumerate(zip(dataloader1, dataloader2)):
x = data1
y = data2
print(x, y)
Output:
tensor([5, 1]) tensor([15, 18])
tensor([0, 2]) tensor([14, 12])
tensor([4, 6]) tensor([16, 10])
tensor([8, 9]) tensor([11, 19])
tensor([7, 3]) tensor([17, 13])
Expected Output:
tensor([5, 1]) tensor([15, 11])
tensor([0, 2]) tensor([10, 12])
tensor([4, 6]) tensor([14, 16])
tensor([8, 9]) tensor([18, 19])
tensor([7, 3]) tensor([17, 13])