Hey; I have multiple loss functions, each with respect to a batch of data point. I want to save each gradient to a buffer.Here is my way of doing
buffer = 
for b in range(num_batch):
loss = loss_func(batch[b])
However, I’m wondering if there is any none-loop way of doing this ?
If you use the same weight you will have to do the for loop I’m afraid.
Note that in your code, you want to do
weights.grad.clone() on the last line. All changes to
.grad are inplace and so after you do
weight.zero_grad(), your buffer will contain only zeros if you don’t clone.
loss_func(batch[b]) is differentiated w.r.t. the same set of parameters.
Thanks for point out the
Also; I’m wondering that for a deep network parameters. What is the best structure to save the the most recent iteration’s gradients and each gradients can also be access through similar thing as index(list isn’t the best choice here I think)? I don’t need the all history but only the gradient information from last iteration.
For more details, I’m trying to implement algorithm 1 in this paper
I’m afraid you will have to do the bookeeping by hand and potentially implement a new optimizer.
As an example, you can look how rms prop handles such bookeeping.