Setting seeds for multi-GPU training

When setting a seed for multi-GPU training (DDP), should I set the same seed (e.g. seed=42) for all ranks? or should I set different seeds for different ranks? Or it doesn’t matter?

I saw some people setting seed = args.seed + rank and also some people setting seed = args.seed. Would either of these cause any problem with the dataset? e.g., will the dataset get shuffled and distributed to different processes correctly with both methods? Will different processes accidentally train on the same portion of the dataset due to using the same/different seed?

Thanks in advance.

In a DDP setup the DistributedSampler is used to split the dataset and the local seed won’t break it.

1 Like