I would like to compute the gradient of “critic” with respect to “actor” 's weights.
(Trying to apply “deterministic policy gradient” equation (6) of the following paper http://proceedings.mlr.press/v32/silver14.pdf)
“actor” and “critic” are two neural networks, consider critic as Q(s, a) state-action value that returns the reward. Therefore I had to concatenate the “states” vector and the action vector “actor_actions” to feed it to the input of the critic. Then I have to compute the gradient of critic with respect to “states” by setting the requires_grad of “states” to True.
optimizer_actor.zero_grad()
actor_actions = actor(states)
critic_values = - critic(torch.cat((states, actor_actions), dim = 2)).sum()/batch_size # notice the minus to apply gradient ascent instead of descent
critic_values.backward()
optimizer_actor.step()
Does the previous code snippet compute the gradient with respect to “actor” weights?