I’m facing the same problem. could you help me please?
I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############
all_agents = []
all_agents.append (Agent (actor_dims, critic_dims))
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()
critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()
target = rewards[:, agent_idx] + agent.gamma * critic_value_
critic_loss= F.mse_loss (critic_value.float (), target.float ())
agent.critic.optimizer.zero_grad ()
critic_loss.backward (retain_graph=True)
actor_loss = agent.critic.forward (states[i], mu_cluster[i]).flatten ()
actor_loss = -(T.mean (actor_loss))
agent.actor.optimizer.zero_grad ()
actor_loss.backward ()
agent.critic.optimizer.step ()
agent.actor.optimizer.step ()```
#################################
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!