Help with computing gradients wrt output itself

Obtaining the gradients with respect to the layer outputs

The formula for a NN output is typically y = activation(w.x) and when we do automatic differentiation we usually differentiate with respects to the weight vector, as in back-propagation. However, how do I differentiate with respect to y itself?

To provide some background, I am passing y, the output of my actor network into my critic network and I would like to minimize the output of my critic network. So I would differentiate the predicted q-value with respect to the input of the critic / output of the actor. However, when I do backwards(), it computes the gradients of the weights in the output layer of the actor network. How can I get the gradient of the output of the actor network itself.

Currently, the very ghetto way that I have been able to achieve this is to take the weight vector of the output layer of the actor network, add the computed gradient to it, then manually compute the output of the actor network (by matrix multiplication). Pass the output to the critic and verify that the predicted q-val has indeed decreased.

Is there a better way to do this?

Okay I figured out how to do this. And it was amazingly simple too… but I’ll leave this here anyway for anyone else in future who might stumble on the same problem because I had a really hard time googling for a solution.

act = actor(state)
act.retain_grad()
q = critic(act)
loss = -q
loss.backward()
new_act = act + act.grad * 0.01
q = critic(new_act)