Effect of np.random.seed(epoch) inside DistributedSampler.set_epoch()?

RylanSchaeffer · January 13, 2022, 4:15am

As a follow up question to DataLoader behaves the same at every epoch with np.random, if I call np.random.seed(epoch) inside my distributed sampler’s set_epoch(epoch) method, what will happen?

    def set_epoch(self, epoch):
        np.random.seed(epoch)

Will that screw up the randomness of my samples?

RylanSchaeffer · January 13, 2022, 4:15am

@ptrblck would you happen to know?

ptrblck · January 13, 2022, 5:29am

The set_epoch method should be called in the DistributedSampler in each epoch before creating the DataLoader to avoid creating the same random indices, as internally the indices will be randomly shuffled using the base seed and the current epoch as seen here.
If you are re-seeding the numpy seed to each epoch, all following random operations in numpy would respect this seed. Since you’ve linked to the DataLoader thread, I assume you are already seeding numpy (and other libraries) in the worker_init_fn in the DataLoader?