We know that when calling:
loss.backward in DDP mode, the gradients of the model on each device will reduce automatically.
In my case, the model in each device will conduct different procedures (like training on different data distributions) and I want to see the effects of them by evaluating models on the valiadation set. After the evaluation step, the models on all devices should be synchronized as usual and the training goes to the next round.
To realize this, I need to prevent the gradient reduction when calling “loss.backward()”, I wonder whether there is way to control the gradient reducetion precedure manully? If possible, how to synchronize the model parameters efficiently later?