Inverting Gradients - Gradient of critic network output wrt action

Hello there,
I want to implement the inverting gradients as shown in the paper " Deep Reinforcement Learning in Parametrized Action Space". But firstly I need to have gradient of the critic network’s output ( Q(s,a) ) with respect to the action ( dQ(s,a) / da ) which is (6) equation in the paper.

I can write the loss as,
loss = value_net(state, policy_net(state))

But how can I differentiate this network with only respect to the action instead state ?

1 Like

@frknayk I am trying to implement this paper too and having trouble implement its gradient inversion. Have you figured it out?

I have found tensorflow implementation

But later I decided to use loss = value_net(state,policy_net(state)) for some reasons

@frknayk Were you able to implement Inverting Gradients? If yes can you please share the snippet?