Without setting a random seed the data loader returns the same random data for each epoch: epoch 1: worker1->[2], worker2->[2], epoch 2: worker1->[2], worker2->[2],...
When I set a random seed in worker_init_fn function I get random data for each worker: epoch 1: worker1->[2], worker2->[4], epoch 2: worker1->[7], worker2->[6],...
How to set a random seed to get a new random data each epoch but the same random data for each worker? epoch 1: worker1->[2], worker2->[2], epoch 2: worker1->[7], worker2->[7],...
However, seeing your DatasetRandom implementation, it seems a bit weird to have the same random data each epoch as you said it in the very begining:
Without setting a random seed the data loader returns the same random data for each epoch: epoch 1: worker1->[2], worker2->[2], epoch 2: worker1->[2], worker2->[2],...
Probably, somewhere you should have some unwanted random seed synchonization. Which ignite version you are using, btw ?
It is about pytorch. I thought Ignine has some elegant solution for this behavior. As you suggested add callback @trainer.on(Events.EPOCH_STARTED)
class DatasetRandom(Dataset):
def __len__(self):
return 2
def __getitem__(self, idx):
return (np.random.randint(1000, size=1), idx%2)
train_loader = DataLoader(DatasetRandom(), batch_size=1, num_workers=2)
for e in range(2):
print('epoch',e)
for i, (images, labels) in enumerate(train_loader):
print(i, images, labels)
I thought Ignine has some elegant solution for this behavior.
In v0.3.0 we had something similar to set_epoch_seed in the Engine automatically, but we found that it has a lot of side effects like you see with evaluator etc.
A seed is setup for torch and python random (not numpy random) to randomize data each time dataloader iterator is created, so if you replace your np.random.randint(1000, size=1) by random.randint(0, 1000), data will be random for each epoch.
It depends on what is behind np.randint calls in your real case. I was thinking about data augmentations that can be parametrized with numpy, so if you have a control of that it would be more simple to replace randomness generation…