Different seeds but weights are still init'ed same for different processes in DDP

I am setting different seed for each process (distributed training setting) but the models weights are still same for all the processes. Is it an expected behavior?

PS: I was under the impression that since weight init depends on seed so, different seed will lead to different weights init for each process.

Yes. At the time of DDP wrapping, parameters and buffers (i.e., model.state_dict()) on rank0 are broadcasted to all other ranks. So although weights on different ranks were initialized from different seeds, they will have the same starting point (rank0’s state) after DDP construction.

4 Likes

sio277 already gave the right answer. This behavior is expected.

DDP must keep the weights on different processes synced at the beginning, so operating the same initial weights and synced gradients can lead to the same updated weights, and this makes the distributed training equivalent to the sequential version. Otherwise distributed training will be mathematically wrong.

2 Likes