I am testing different set-ups with the same data. To ensure that different results aren’t caused by a different load order of the data, I wish to know how we can make shuffle behave the same way all the time. Is it by setting torch.manual_seed?
Be aware that if your model uses cudnn you have to set the deterministic flag to True, compromising performance:
torch.backends.cudnn.deterministic = True
In general a different shuffle of the data would generate different estimators of the gradient and thus different convergence.
Then, yes, I think that setting torch.manual_seed would fix that. I usually set both the torch and numpy seed. You can also provide to the dataloader the order on which you want to sample. Take a look at the argument sampler from https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader