# Will the gradients be collected correctly if I modify them manually with `tensor.register_hook` in DistributedDataParallel training?

I want to modify gradients manually with function `tensor.register_hook()`.
I’ve tested that if training on one GPU, gradients will be collected correctly.
But I’m not sure if use DDP training, will the gradients also be collected correctly.

``````def modify_grad(self, g):
return g * 3

``````

Say, if I changed the gradient on each GPU by multiplying 3, so 3 times of the original reduced gradient will be broadcast to each GPU?

This topic seems to be related to your question and also has a code example.

I did some test. `Tensor.register_hook()` does work in DDP. I first wrap my model with DDP, then I modify the grad manually with a custom function in the `forward` code where I construct my model:

Here’s the gradients after calling `loss.backward()`:
You can see the gradients are just 3 times after being modified, which means `.register_hook()` works well in DDP.