Autograd.grad with multiple loss function

ElleryL · February 12, 2019, 3:07pm

Hey; I have multiple loss functions, each with respect to a batch of data point. I want to save each gradient to a buffer.Here is my way of doing

buffer = []
for b in range(num_batch):
    weights.zero_grad()
    loss = loss_func(batch[b])
    loss.backward()
    buffer.append(weights.grad)

However, I’m wondering if there is any none-loop way of doing this ?

albanD · February 12, 2019, 3:18pm

If you use the same weight you will have to do the for loop I’m afraid.
Note that in your code, you want to do weights.grad.clone() on the last line. All changes to .grad are inplace and so after you do weight.zero_grad(), your buffer will contain only zeros if you don’t clone.

ElleryL · February 12, 2019, 7:19pm

Each loss_func(batch[b]) is differentiated w.r.t. the same set of parameters.

Thanks for point out the clone() part.

ElleryL · February 12, 2019, 7:50pm

Also; I’m wondering that for a deep network parameters. What is the best structure to save the the most recent iteration’s gradients and each gradients can also be access through similar thing as index(list isn’t the best choice here I think)? I don’t need the all history but only the gradient information from last iteration.

For more details, I’m trying to implement algorithm 1 in this paper

albanD · February 13, 2019, 10:05am

Hi,

I’m afraid you will have to do the bookeeping by hand and potentially implement a new optimizer.
As an example, you can look how rms prop handles such bookeeping.

ElleryL · February 14, 2019, 10:27pm

Thanks for the link!