Quick one - is it necessary to use a manual seed when using DDP to ensure all local models initialise with the same parameters? I have been looking and seen examples that do and examples that don’t but nothing definitive.
Many thanks!
Quick one - is it necessary to use a manual seed when using DDP to ensure all local models initialise with the same parameters? I have been looking and seen examples that do and examples that don’t but nothing definitive.
Many thanks!
No, DDP shares the state_dict
in the first iteration, so you won’t need to seed the code for the parameter initialization, but might need it for other use cases.
From the DDP Internal Design:
The DDP constructor takes a reference to the local module, and broadcasts
state_dict()
from the process with rank 0 to all other processes in the group to make sure that all model replicas start from the exact same state. Then, each DDP process creates a localReducer
, which later will take care of the gradients synchronization during the backward pass. …