How do I modify gradients of DDP models after calling backward()?

I’m implementing an algorithm for which I need to modify the gradients after calling zero_grad() and backward() of multiple loss. I wrote a structure in the model to record the gradient for each back-prop, and then called a method of that structure to modify the model’s gradients directly.
The problem is, this step could be finished in only one gpu, and how do I synchronize the gradients of models on gpu?

you can call dist.broadcast() if you want to broadcast the gradient from rank 0 to other ranks; or you can call dist.all_reduce() if you want to sum the gradient from all the ranks

1 Like