torch/nn/parallel/distributed.py register a hook like this:
def reduction_fn_nccl():
...
# Now register the reduction hook on the parameters
for p in self.module.parameters():
if not p.requires_grad:
continue
def allreduce_hook(*unused):
Variable._execution_engine.queue_callback(reduction_fn_nccl)
p.register_hook(allreduce_hook)
I do not understant why do not p.register_hook(reduction_fn_nccl)
immediatly,