Loss backward with distributed setting

sherdencooper · October 23, 2019, 2:52pm

I am using torch.distributed to run federated learning. One worker trains an extra model besides the distributed model. However, I find that when the worker trains his own model, the gradient is averaged by other workers’ gradients for the main task. Can the loss.backward() for the extra model be not influenced by gradient average? Thx a lot in advance.

albanD · October 23, 2019, 4:11pm

Hi,

I’m afraid it cannot be as .backward() accumulates in the .grad field all the time.
You can avoid this though by computing gradients as:
grad = torch.autograd.grad(loss, model.parameters()).