Does gather() support autograd in distributed training model?

zhengyun · May 8, 2018, 6:00am

For the multi-gpu training on single machine, the function gather(outputs, target_device, dim=0) in pytorch/torch/nn/parallel/scatter_gather.py supports autograd.

For the multi-machine distributed training, the function gather(tensor, **kwargs) in pytorch/torch/distributed/init.py does not support autograd.

Is my understanding correct?