Question about DDPG loss.backward()

jayjay · November 1, 2023, 5:00pm

In DDPG implementation, I have a quick question about loss.backward()

Following is the loss.update part of DDPG

target_Q = critic_target(next_state, actor_target(next_state))
target_Q = reward + (done * args.gamma * target_Q).detach()
current_Q = critic(state, action)
critic_loss = F.mse_loss(current_Q, target_Q)

critic_optimizer.zero_grad()
critic_loss.backward()
critic_optimizer.step()

actor_loss = -critic(state, actor(state)).mean()

actor_optimizer.zero_grad()
actor_loss.backward()
actor_optimizer.step()

My question is that once critic_loss.backward() is done, gradients(from critic_loss to each leaf variables end) are computed and when actor_loss.backward() is called, these gradients made in critic_loss.backward() may affect gradients values for actor networks (because of actor_loss form(=critic(state, actor(state)))?

vmoens · November 1, 2023, 5:27pm

Yeah, in some implementations people just don’t care about that because they will call zero grad on the critic just after (like you do).
But in practice it’s best not to backprop through the critic at all to avoid surprises (eg when copy pasting the code).
If someone is calling zero_grad after the optimizer step (which is generally ok) in this case it will be a problem.
Also, you’re spending time computing gradients that you’re not going to use.

TL;DR: best to exclude your critic when you’re comptuting grads for the actor.