by default we only support accumulated gradients, so this is not easy to do.
If you dont have memory constraints, you can use the torch.autograd.grad interface to compute separate gradients in one shot. (they wont be in .grad, but will be explicitly returned, so some manual book-keeping is needed)
grads = torch.autograd.grad(loss,parameters,retain_graph=True)
would return the gradients as a tuple matching exactly the parameters provided. So that you don’t need to make a for loop to get the gradients as the code example above. Same size same order as the parameters provided.
Dear @evcu, in your example, loss is scalar and parameters is a vector. However, the question is for multiple losses, and computing these without summation. For me, @smth’s reference to torch.autograd.grad wasn’t helpful either. Its docs start with: “Computes and returns the sum of gradients of outputs w.r.t. the inputs”, and that seems to be what it does. Under what configuration does grad return individual gradients (at the expense of memory)?
As aside, This is a pretty useful feature for differential privacy.