I was trying to use some randomness in a dataset I built, and got a very strange behavior.
For simplicity I reproduced this behavior in the following toy example:
import torch
import torch.utils.data as D
class TestDataset(D.Dataset):
def __init__(self, mysize=20):
super(TestDataset, self).__init__()
self.mysize = mysize
def __len__(self):
return self.mysize
def __getitem__(self, idx):
return torch.tensor(np.random.randint(0,10,size=4))
td=TestDataset(16)
dl=D.DataLoader(td,batch_size=2,num_workers=4)
for d in dl:
print (d)
The results I get are
tensor([[6, 2, 0, 6],
[8, 4, 5, 3]])
tensor([[6, 2, 0, 6],
[8, 4, 5, 3]])
tensor([[6, 2, 0, 6],
[8, 4, 5, 3]])
tensor([[6, 2, 0, 6],
[8, 4, 5, 3]])
tensor([[7, 4, 7, 0],
[5, 6, 0, 6]])
tensor([[7, 4, 7, 0],
[5, 6, 0, 6]])
tensor([[7, 4, 7, 0],
[5, 6, 0, 6]])
tensor([[7, 4, 7, 0],
[5, 6, 0, 6]])
Instead of random output, the output repeat itself according to the number of workers.
A Bug? A Feature?