Distributed Sampling and Shuffling Samples

Hello pytorch devs and users,

My training runs in a distributed environment and I have to ensure that each parameter is initialized to the same value by calling the following instructions before spawning my processes:

def setRandomSeeds(randomSeed=0):
    torch.manual_seed(randomSeed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark     = False
    np.random.seed(randomSeed)
    random.seed(randomSeed)

It works perfectly. However; this time “DistributedSampler” fails to shuffle samples at every epoch. What should I turn off/on after weight initialization in every process so that it shuffles samples properly before every epoch starts?

Thanks in advance.

You might have forgotten to call the sampler.set_epoch() method. From the docs:

In distributed mode, calling the set_epoch() method at the beginning of each epoch before creating the DataLoader iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used.

2 Likes

Sorry for the late reply @ptrblck , but it took some time of mine to give it a try and probably due to different time zone as well.
Yes! It really did the trick :slight_smile: .
Thank you!