theta’ = theta - d(Loss(theta, target)) / d theta
this is common way of gradient descent.
I want to update theta such as
theta = theta - d (Loss(theta’, target)) / d theta
and maybe it need second order gradient of theta at the Loss made by theta’ and target.
Then, how to code for this way?
If I call loss.backward after first update of theta, then I would get d Loss(theta’, target) / d theta’.
Then, If I change the parameters of my model from theta’ to theta, then
will backward function compute d Loss(theta’, target) / d theta ?