Hello, my code has deterministic behavior without DistributedDataParallel, however, not deterministic with DistributedDataParallel.
My code for deterministic behavior is:
cudnn.benchmark = False
cudnn.deterministic = True
random.seed(123)
torch.manual_seed(123)
torch.cuda.manual_seed_all(123)
torch.utils.data.DataLoader(…, worker_init_fn=random.seed)
And my launch command:
python - torch.distributed.launch
–nproc_per_node=4
–master_ports=$((RANDOM + 10000))
train.py
Does the DistributedDataParallel need more tricks to get deterministic behavior?