How to set random seed when it is in distributed training?

sunshk1227 · May 30, 2020, 2:49am

Hi everyone!

Now I am training a model using torch.distributed, but I am not sure how to set the random seeds. For example, this is my current code:

def main():
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    torch.cuda.manual_seed(args.seed)

    cudnn.enabled = True
    cudnn.benchmark = True
    cudnn.deterministic = True 

    mp.spawn(main_worker, nprocs=args.ngpus, args=(args,))

And should I move the

    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    torch.cuda.manual_seed(args.seed)

    cudnn.enabled = True
    cudnn.benchmark = True
    cudnn.deterministic = True

into the function main_worker() to make sure every process has the correct seed and cudnn settings? By the way, I have tried this and this behavior will make the training process 2 times slower, which really confused me.

Thank you very much for any help!

ptrblck · May 31, 2020, 7:22am

Each process should execute the seeding code.
The slowdown might come from e.g. cudnn.deterministic = True, as this will use the default algorithm, which might be slower than the others.
Also, cudnn.benchmark = True won’t have any effect, if you set cudnn.deterministic = True.

sunshk1227 · May 31, 2020, 7:44am

Thank you very much! I get it!