What is Variable._execution_engine.queue_callback?

acgtyrant · April 23, 2018, 8:13am

torch/nn/parallel/distributed.py register a hook like this:

def reduction_fn_nccl():
    ...

# Now register the reduction hook on the parameters
for p in self.module.parameters():
    if not p.requires_grad:
        continue

    def allreduce_hook(*unused):
        Variable._execution_engine.queue_callback(reduction_fn_nccl)

    p.register_hook(allreduce_hook)

I do not understant why do not p.register_hook(reduction_fn_nccl) immediatly,

albanD · April 23, 2018, 10:19am

Hi,

This is because this callback should be called once all the backprop work has been done.
Such a callback can only be added during the backward pass itself with the current implementation. This is done using the queue_callback method.