Distributed Sampling and Shuffling Samples

Hello pytorch devs and users,

My training runs in a distributed environment and I have to ensure that each parameter is initialized to the same value by calling the following instructions before spawning my processes:

def setRandomSeeds(randomSeed=0):
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark     = False

It works perfectly. However; this time “DistributedSampler” fails to shuffle samples at every epoch. What should I turn off/on after weight initialization in every process so that it shuffles samples properly before every epoch starts?

Thanks in advance.

You might have forgotten to call the sampler.set_epoch() method. From the docs:

In distributed mode, calling the set_epoch() method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.

Sorry for the late reply @ptrblck , but it took some time of mine to give it a try and probably due to different time zone as well.
Yes! It really did the trick :slight_smile: .
Thank you!