How to get deterministic behavior with DistributedDataParallel?

Hello, my code has deterministic behavior without DistributedDataParallel, however, not deterministic with DistributedDataParallel.

My code for deterministic behavior is:

cudnn.benchmark = False
cudnn.deterministic = True
torch.cuda.manual_seed_all(123)…, worker_init_fn=random.seed)

And my launch command:
python - torch.distributed.launch
–master_ports=$((RANDOM + 10000))

Does the DistributedDataParallel need more tricks to get deterministic behavior?

1 Like

DistributedDataParallel should be deterministic. All it does is applying allreduce to sync gradients across processes. Can you check if the data loader is providing deterministic inputs for you?