Are the model parameters uniform on each GPU?

For example, if I train with multiple GPUs, is the model on each gpu exactly the same after each training epoch?

Yes, as explained in the Internal design of DDP.

1 Like