How to get deterministic behavior with DistributedDataParallel?

Hello, my code has deterministic behavior without DistributedDataParallel, however, not deterministic with DistributedDataParallel.

My code for deterministic behavior is:

cudnn.benchmark = False
cudnn.deterministic = True
random.seed(123)
torch.manual_seed(123)
torch.cuda.manual_seed_all(123)
torch.utils.data.DataLoader(…, worker_init_fn=random.seed)

And my launch command:
python - torch.distributed.launch
–nproc_per_node=4
–master_ports=$((RANDOM + 10000))
train.py

Does the DistributedDataParallel need more tricks to get deterministic behavior?

1 Like

DistributedDataParallel should be deterministic. All it does is applying allreduce to sync gradients across processes. Can you check if the data loader is providing deterministic inputs for you?