In DDPG implementation, I have a quick question about loss.backward()
Following is the loss.update part of DDPG
target_Q = critic_target(next_state, actor_target(next_state))
target_Q = reward + (done * args.gamma * target_Q).detach()
current_Q = critic(state, action)
critic_loss = F.mse_loss(current_Q, target_Q)
critic_optimizer.zero_grad()
critic_loss.backward()
critic_optimizer.step()
actor_loss = -critic(state, actor(state)).mean()
actor_optimizer.zero_grad()
actor_loss.backward()
actor_optimizer.step()
My question is that once critic_loss.backward() is done, gradients(from critic_loss to each leaf variables end) are computed and when actor_loss.backward() is called, these gradients made in critic_loss.backward() may affect gradients values for actor networks (because of actor_loss form(=critic(state, actor(state)))?