Autograd.grad with multiple loss function

Hey; I have multiple loss functions, each with respect to a batch of data point. I want to save each gradient to a buffer.Here is my way of doing

buffer = []
for b in range(num_batch):
    weights.zero_grad()
    loss = loss_func(batch[b])
    loss.backward()
    buffer.append(weights.grad)

However, I’m wondering if there is any none-loop way of doing this ?

If you use the same weight you will have to do the for loop I’m afraid.
Note that in your code, you want to do weights.grad.clone() on the last line. All changes to .grad are inplace and so after you do weight.zero_grad(), your buffer will contain only zeros if you don’t clone.

Each loss_func(batch[b]) is differentiated w.r.t. the same set of parameters.

Thanks for point out the clone() part.

Also; I’m wondering that for a deep network parameters. What is the best structure to save the the most recent iteration’s gradients and each gradients can also be access through similar thing as index(list isn’t the best choice here I think)? I don’t need the all history but only the gradient information from last iteration.

For more details, I’m trying to implement algorithm 1 in this paper

Hi,

I’m afraid you will have to do the bookeeping by hand and potentially implement a new optimizer.
As an example, you can look how rms prop handles such bookeeping.

Thanks for the link!