The set_epoch method should be called in the DistributedSampler in each epoch before creating the DataLoader to avoid creating the same random indices, as internally the indices will be randomly shuffled using the base seed and the current epoch as seen here.
If you are re-seeding the numpy seed to each epoch, all following random operations in numpy would respect this seed. Since you’ve linked to the DataLoader thread, I assume you are already seeding numpy (and other libraries) in the worker_init_fn in the DataLoader?