grads = autograd.grad(loss, x, retain_graph=True)[0].view(x.size(0),-1) grads.volatile = False grads = grads.detach() out, h_t, c_t, h_t2, c_t2 = rnn(grads, h_t, c_t, h_t2, c_t2) x = x - out
I’m trying to implement Learning to Learn by Gradient Descent By Gradient Descent. I would like to be able to compute the gradient of the optimizee’s weights(x
) with respect to the RNN’s weights(RNN produces the step (out
) with which to update optimizee’s weights).
Will the above code store the history of computations on x?
Or should this be done in-place?