Quickly get individual gradients (not sum of gradients) of all network outputs

Right now, I’m doing:

output_gradients = []

for output in net_outputs:
    tmp_grad = {}
    net.zero_grad()
    output.backward(retain_graph=True)
    for name, param in net.named_parameters():
         tmp_grad[name] = param.grad
    output_gradients.append(tmp_grad)

Since I have to call backward on each output, the backward passes are not parallelized and thus, the code is pretty slow.

Is there a faster way? Thanks!

1 Like

by default we only support accumulated gradients, so this is not easy to do.
If you dont have memory constraints, you can use the torch.autograd.grad interface to compute separate gradients in one shot. (they wont be in .grad, but will be explicitly returned, so some manual book-keeping is needed)

2 Likes

Thanks for the tips! Though we were able to rewrite the math to avoid doing this.

@smth Can you elaborate more on how to use the torch.autograd.grad interface to compute separate gradients in one shot? Thanks.

grads = torch.autograd.grad(loss,parameters,retain_graph=True)
would return the gradients as a tuple matching exactly the parameters provided. So that you don’t need to make a for loop to get the gradients as the code example above. Same size same order as the parameters provided.

1 Like

Is it possible to do backward propagation in parallel way? Thanks!

1 Like

Dear @evcu, in your example, loss is scalar and parameters is a vector. However, the question is for multiple losses, and computing these without summation. For me, @smth’s reference to torch.autograd.grad wasn’t helpful either. Its docs start with: “Computes and returns the sum of gradients of outputs w.r.t. the inputs”, and that seems to be what it does. Under what configuration does grad return individual gradients (at the expense of memory)?

As aside, This is a pretty useful feature for differential privacy.

3 Likes