I began to train my MADDPG model, but there’s something wrong while calculating the backward.
Here’s the detail of the error
And here’s the code of the training loop:
for idx in range(1, len(self.agents) + 1):
with torch.autograd.set_detect_anomaly(True):
critic_value_ = \
self.agents[idx - 1].target_critic.forward(new_pos_state, new_phase_state, new_actions).flatten()
critic_value_[dones[:, 0]] = 0.0
critic_value = self.agents[idx - 1].critic.forward(pos_state, phase_state, old_actions).flatten()
target = rewards[:, idx - 1] + self.agents[idx - 1].gamma * critic_value_
critic_loss = F.mse_loss(target, critic_value)
self.agents[idx - 1].critic.optimizer.zero_grad()
critic_loss.backward(retain_graph=True)
self.agents[idx - 1].critic.optimizer.step()
actor_loss = self.agents[idx - 1].critic.forward(pos_state, phase_state, mu).flatten()
actor_loss = -torch.mean(actor_loss)
self.agents[idx - 1].actor.optimizer.zero_grad()
actor_loss.backward(retain_graph=True)
self.agents[idx - 1].actor.optimizer.step()
self.agents[idx - 1].update_network_parameters()
I think the line critic_value_[dones[:, 0]] = 0.0
may be problematic. However, I have no idea how to fix it well.
The other relative codes have been uploaded to my Github. Also, I can provide more other codes if necessary.
Hope someone can help me, thanks!