As a follow up question to DataLoader behaves the same at every epoch with np.random, if I call np.random.seed(epoch)
inside my distributed sampler’s set_epoch(epoch)
method, what will happen?
def set_epoch(self, epoch):
np.random.seed(epoch)
Will that screw up the randomness of my samples?
@ptrblck would you happen to know?
The set_epoch
method should be called in the DistributedSampler
in each epoch before creating the DataLoader
to avoid creating the same random indices, as internally the indices will be randomly shuffled using the base seed and the current epoch as seen here.
If you are re-seeding the numpy seed to each epoch, all following random operations in numpy would respect this seed. Since you’ve linked to the DataLoader
thread, I assume you are already seeding numpy (and other libraries) in the worker_init_fn
in the DataLoader
?