If set random seed across multi-gpu is necessary in DistributedDataParallel？

hhxx · April 25, 2020, 1:52pm

Following imagenet-example: https://github.com/pytorch/examples/blob/master/imagenet/main.py, It seems that seed is not set in default (default is None):
parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ')

But when we use DistributedDataParallel mode, if seed is not set, the initialized parameters across multi-gpu will be different, resulting in different model param is kept in different gpus during training process (although we only save ckpt in rank0 gpu).

I am not sure whether this phenomenon will cause unknown errors, or may lead to an unstable results? Is it safe for me not to set the initialization seed？

mrshenli · April 25, 2020, 9:56pm

This should be fine, because DistributedDataParallel broadcasts model states from rank 0 to all other ranks at construction time. See the code below:

github.com

pytorch/pytorch/blob/34284c127930dc12d612c47cab44cf09b432b522/torch/nn/parallel/distributed.py#L280-L285


# Sync params and buffers
module_states = list(self.module.state_dict().values())
if len(module_states) > 0:
    self._distributed_broadcast_coalesced(
        module_states,
        self.broadcast_bucket_size)