When I run model on multiple GPUs,register_hook is invalid

I want to save gradients of internal variables through register_hook() or retain_grad().
When I run model on single GPU, it works.
But when I run model on multiple GPUs through wrapping model into nn.DataParallel, I find that it doesn’t work.
Can anyone help me?

based on comments “In each forward, :attr:module is replicated on each device, so any
updates to the running module in forward will be lost. For example,
if :attr:module has a counter attribute that is incremented in each
forward, it will always stay at the initial value because the update
is done on the replicas which are destroyed after forward. However,
:class:~torch.nn.DataParallel guarantees that the replica on
device[0] will have its parameters and buffers sharing storage with
the base parallelized :attr:module. So in-place updates to the
parameters or buffers on device[0] will be recorded.”

That means gradients of internal variables can not be updated in multiple GPUs, can only be updated in device[0]. if you want to sync buffers, you can try to use DistributedDataParallel package?