Hi guys,
I’ve been trying to figure out wether model_with_ddp
and model_without_ddp
share the same state dict or not.
For example, if I create model_without_ddp
via
model_without_ddp = model_with_ddp.module
and update model_with_ddp
parameters via some loss and some optimizer, will
model_without_ddp
change correspondingly?
Many thanks,
Yiming
Changing the internal parameters via ddp.module
should be reflected in the parent DDP
module on this rank. However, you would run into the risk of diverging the models on different ranks if you are not properly communicating the gradients between all ranks.
Hi Piotr,
Sorry for the my poor explanation.
How about the reverse? I mean if there is a model_without_ddp
defined (via ddp.module
) and stored in the memory, and later the parallelized model changes its params, will this also be reflected in previously defined model_without_ddp
?
Or, do they share the same memory?
Many thanks,
Yiming