Accumulating Gradients Manually

I am currently trying to implement a Federated Learning algorithm, and have a question about gradient accumulation for model parameters.

To simulate that the server model must receive gradients from all client model (akin to FedSGD) before each update, I have implemented following code:

optim = optimizer.SGD(network.parameters(), lr=0.001)
optim.zero_grad()

for client in client_list:
    # compute g_client as an output of torch.autograd.grad
    # ...
    for p, g in zip(network.parameters(), g_client):
        if p.grad is None:
            p.grad = g
        else:
            p.grad += g

for p in network.parameters():  # average the gradient
    p.grad /= len(client_list)

optim.step()

I am wondering if this is a proper way to accumulate the gradients from the different clients before updating my network module? This is my first time doing model backpropagation and weight updates without relying on “loss.backward(); optim.step()”, so I am wondering if there are some under-the-hood mechanisms that I am missing for model training.

Any response or advice would be welcome!