I am running a project where each agent has its own networks, so I need to use backward() multi times.
I have posted an issue to illustrate it: I am training my multi agents reinforcement learning project, and I got an error “Trying to backward through the graph a second time…” - reinforcement-learning - PyTorch Forums.
Some modified codes are here:
for agent_idx in range(self.n_agents): """ Critic Part """ # current Q estimate current_Q1, current_Q2 = agent.critic.forward(states, old_actions) # target Q value with T.no_grad(): target_Q1, target_Q2 = agent.target_critic.forward(states_, new_actions) target_Q_min = T.min(target_Q1, target_Q2) # target_Q[dones[:, 0]] = 0.0 target_Q = rewards[:, agent_idx] + (agent.gamma * target_Q_min) # critic loss # critic loss calculation self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\ F.mse_loss(current_Q2.float(), target_Q.float()) # critic optimization self.agents[agent_idx].critic.optimizer.zero_grad() self.agents[agent_idx].critic_loss.backward() self.agents[agent_idx].critic.optimizer.step() """ Actor Part """ if steps_total % self.freq == 0 and steps_total > 0: # actor loss calculation, here I use detach() function for variables self.agents[agent_idx].actor_loss = self.agents[agent_idx].critic.Q1(states.detach(), mu.detach()) self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].actor_loss) # actor optimization self.agents[agent_idx].actor.optimizer.zero_grad() self.agents[agent_idx].actor_loss.backward() self.agents[agent_idx].actor.optimizer.step() self.agents[agent_idx].update_network_parameters()
After adding some detach() functions, my codes could run. However, I have no idea when and why I need to use this function.
Another issue is that both critic and actor need some variables(states, old_actions, states_, new_actions, mu) to calculate losses. In the critic part, I do not need to detach variables(states, old_actions, states_, new_actions) and the backward() function could be called multi times. However, if the variables(states, mu) are not detached in actor part, there is an error about ‘backward through the graph a second time’. I have no idea about this issue.