Understanding Model's Parameters in Distributed DataParallel

Super_Villain · July 6, 2023, 4:12pm

Hi guys,

I’ve been trying to figure out wether model_with_ddp and model_without_ddp share the same state dict or not.

For example, if I create model_without_ddp via
model_without_ddp = model_with_ddp.module
and update model_with_ddp parameters via some loss and some optimizer, will
model_without_ddp change correspondingly?

Many thanks,
Yiming

ptrblck · July 6, 2023, 5:45pm

Changing the internal parameters via ddp.module should be reflected in the parent DDP module on this rank. However, you would run into the risk of diverging the models on different ranks if you are not properly communicating the gradients between all ranks.

Super_Villain · July 6, 2023, 7:14pm

Hi Piotr,

Sorry for the my poor explanation.
How about the reverse? I mean if there is a model_without_ddp defined (via ddp.module) and stored in the memory, and later the parallelized model changes its params, will this also be reflected in previously defined model_without_ddp?
Or, do they share the same memory?
Many thanks,
Yiming