I am using torch.distributed to run federated learning. One worker trains an extra model besides the distributed model. However, I find that when the worker trains his own model, the gradient is averaged by other workers’ gradients for the main task. Can the loss.backward() for the extra model be not influenced by gradient average? Thx a lot in advance.
I’m afraid it cannot be as
.backward() accumulates in the
.grad field all the time.
You can avoid this though by computing gradients as:
grad = torch.autograd.grad(loss, model.parameters()).