Applying the chain rule

sofiane · March 3, 2022, 8:03pm

I would like to compute the gradient of “critic” with respect to “actor” 's weights.
(Trying to apply “deterministic policy gradient” equation (6) of the following paper http://proceedings.mlr.press/v32/silver14.pdf)
“actor” and “critic” are two neural networks, consider critic as Q(s, a) state-action value that returns the reward. Therefore I had to concatenate the “states” vector and the action vector “actor_actions” to feed it to the input of the critic. Then I have to compute the gradient of critic with respect to “states” by setting the requires_grad of “states” to True.

optimizer_actor.zero_grad()
actor_actions = actor(states)
critic_values = - critic(torch.cat((states, actor_actions), dim = 2)).sum()/batch_size # notice the minus to apply gradient ascent instead of descent
critic_values.backward()
optimizer_actor.step()

Does the previous code snippet compute the gradient with respect to “actor” weights?

ptrblck · March 4, 2022, 11:50pm

Yes, it should unless you have detached a tensor somewhere internally.
The actor_actions tensor seems to be created by the actor so I assume its parameters were used to create the output. If so, then passing the concatenated tensor to the critic and calling backward on its output should calculate the gradients of both models.
You can verify it by printing the .grad attributes of the parameters of both models after the backward call.