In DDP docs, there is a note that warns the user should never try to change the model’s parameters
You should never try to change your model’s parameters after wrapping up your model with
DistributedDataParallel
. Because, when wrapping up your model withDistributedDataParallel
, the constructor ofDistributedDataParallel
will register the additional gradient reduction functions on all the parameters of the model itself at the time of construction. If you change the model’s parameters afterwards, gradient reduction functions no longer match the correct set of parameters.
I wonder if it is allowed to just change the values of the model parameters after wrapping the model with DDP.