I found a problem when use
torch.dist.allreduce. I want to manually reduce and sum all model parameter gradients.
This is the first solution, which can give me the correct reduced_and_sum results.
for p in params: dist.all_reduce(p.grad, op=dist.ReduceOp.SUM)
However, the below second solution does not do any reduce at all after running and returns me the same value before the reduction.
def sync_para(x): dist.all_reduce(x.grad, op=dist.ReduceOp.SUM) x.grad = grad map(lambda x: sync_para(x), params)
map function cannot be used at here?