Is Dataloader's shuffle determined by torch.manual_seed?

BramVanroy · April 2, 2019, 8:09am

I am testing different set-ups with the same data. To ensure that different results aren’t caused by a different load order of the data, I wish to know how we can make shuffle behave the same way all the time. Is it by setting torch.manual_seed?

jmaronas · April 2, 2019, 8:15am

Be aware that if your model uses cudnn you have to set the deterministic flag to True, compromising performance:

torch.backends.cudnn.deterministic = True

In general a different shuffle of the data would generate different estimators of the gradient and thus different convergence.

Then, yes, I think that setting torch.manual_seed would fix that. I usually set both the torch and numpy seed. You can also provide to the dataloader the order on which you want to sample. Take a look at the argument sampler from https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader