No, sorry I just mistyped it in the snippet (I edited now); j is the index of the inner loop so it makes sense.
I think that, since fmodel
is functional and the graph of all inner iterations is kept in order to backpropagate to buff_imgs
, when I call backward on ds_loss
to update the current learning rate, the gradient actually flows back trough all the graph.
What could I do to prevent this? I need the whole graph to update buff_imgs
but when I update the learning rate I just need the gradient to flow to the current learning rate tensor.
I think one solution could be to calculate the output of fmodel
two times: one to update buff_imgs
after all iterations and the other with no_grad()
to update the learning rate. Would that work?