Hi,
I am running a project where each agent has its own networks, so I need to use backward() multi times.
I have posted an issue to illustrate it: I am training my multi agents reinforcement learning project, and I got an error “Trying to backward through the graph a second time…” - reinforcement-learning - PyTorch Forums.
Some modified codes are here:
for agent_idx in range(self.n_agents):
""" Critic Part """
# current Q estimate
current_Q1, current_Q2 = agent.critic.forward(states, old_actions)
# target Q value
with T.no_grad():
target_Q1, target_Q2 = agent.target_critic.forward(states_, new_actions)
target_Q_min = T.min(target_Q1, target_Q2)
# target_Q[dones[:, 0]] = 0.0
target_Q = rewards[:, agent_idx] + (agent.gamma * target_Q_min)
# critic loss
# critic loss calculation
self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\
F.mse_loss(current_Q2.float(), target_Q.float())
# critic optimization
self.agents[agent_idx].critic.optimizer.zero_grad()
self.agents[agent_idx].critic_loss.backward()
self.agents[agent_idx].critic.optimizer.step()
""" Actor Part """
if steps_total % self.freq == 0 and steps_total > 0:
# actor loss calculation, here I use detach() function for variables
self.agents[agent_idx].actor_loss = self.agents[agent_idx].critic.Q1(states.detach(), mu.detach())
self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].actor_loss)
# actor optimization
self.agents[agent_idx].actor.optimizer.zero_grad()
self.agents[agent_idx].actor_loss.backward()
self.agents[agent_idx].actor.optimizer.step()
self.agents[agent_idx].update_network_parameters()
After adding some detach() functions, my codes could run. However, I have no idea when and why I need to use this function.
Another issue is that both critic and actor need some variables(states, old_actions, states_, new_actions, mu) to calculate losses. In the critic part, I do not need to detach variables(states, old_actions, states_, new_actions) and the backward() function could be called multi times. However, if the variables(states, mu) are not detached in actor part, there is an error about ‘backward through the graph a second time’. I have no idea about this issue.