Why do I fail to implement the backward propagation with MADDPG?

ntuce002 · December 30, 2021, 8:37am

I began to train my MADDPG model, but there’s something wrong while calculating the backward.
Here’s the detail of the error

And here’s the code of the training loop:

       for idx in range(1, len(self.agents) + 1):
            with torch.autograd.set_detect_anomaly(True):
                critic_value_ = \
                    self.agents[idx - 1].target_critic.forward(new_pos_state, new_phase_state, new_actions).flatten()

                critic_value_[dones[:, 0]] = 0.0
                critic_value = self.agents[idx - 1].critic.forward(pos_state, phase_state, old_actions).flatten()

                target = rewards[:, idx - 1] + self.agents[idx - 1].gamma * critic_value_
                critic_loss = F.mse_loss(target, critic_value)
                self.agents[idx - 1].critic.optimizer.zero_grad()
                critic_loss.backward(retain_graph=True)
                self.agents[idx - 1].critic.optimizer.step()

                actor_loss = self.agents[idx - 1].critic.forward(pos_state, phase_state, mu).flatten()
                actor_loss = -torch.mean(actor_loss)
                self.agents[idx - 1].actor.optimizer.zero_grad()
                actor_loss.backward(retain_graph=True)
                self.agents[idx - 1].actor.optimizer.step()

                self.agents[idx - 1].update_network_parameters()

I think the line critic_value_[dones[:, 0]] = 0.0 may be problematic. However, I have no idea how to fix it well.
The other relative codes have been uploaded to my Github. Also, I can provide more other codes if necessary.

Hope someone can help me, thanks!