When and why do I need to use detach() in loss calculating and backpropagation?

Hi,

I am running a project where each agent has its own networks, so I need to use backward() multi times.
I have posted an issue to illustrate it: I am training my multi agents reinforcement learning project, and I got an error “Trying to backward through the graph a second time…” - reinforcement-learning - PyTorch Forums.

Some modified codes are here:

 for agent_idx in range(self.n_agents):
        """ Critic Part """
        # current Q estimate
        current_Q1, current_Q2 = agent.critic.forward(states, old_actions)
        # target Q value
        with T.no_grad():
           target_Q1, target_Q2 = agent.target_critic.forward(states_, new_actions)
           target_Q_min = T.min(target_Q1, target_Q2)
           # target_Q[dones[:, 0]] = 0.0
           target_Q = rewards[:, agent_idx] + (agent.gamma * target_Q_min)
        # critic loss
        # critic loss calculation
        self.agents[agent_idx].critic_loss = F.mse_loss(current_Q1.float(), target_Q.float()) +\
                                             F.mse_loss(current_Q2.float(), target_Q.float())

        # critic optimization
        self.agents[agent_idx].critic.optimizer.zero_grad()
        self.agents[agent_idx].critic_loss.backward()
        self.agents[agent_idx].critic.optimizer.step()

        """ Actor Part """
        if steps_total % self.freq == 0 and steps_total > 0:
            # actor loss calculation, here I use detach() function for variables
            self.agents[agent_idx].actor_loss = self.agents[agent_idx].critic.Q1(states.detach(), mu.detach())
            self.agents[agent_idx].actor_loss = -T.mean(self.agents[agent_idx].actor_loss)
            # actor optimization
            self.agents[agent_idx].actor.optimizer.zero_grad()
            self.agents[agent_idx].actor_loss.backward()
            self.agents[agent_idx].actor.optimizer.step()
            self.agents[agent_idx].update_network_parameters()

After adding some detach() functions, my codes could run. However, I have no idea when and why I need to use this function.

Another issue is that both critic and actor need some variables(states, old_actions, states_, new_actions, mu) to calculate losses. In the critic part, I do not need to detach variables(states, old_actions, states_, new_actions) and the backward() function could be called multi times. However, if the variables(states, mu) are not detached in actor part, there is an error about ‘backward through the graph a second time’. I have no idea about this issue.