I’m facing the same problem. I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
I’m facing the same problem. could you help me please?
I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
when I remove retain_graph=True it gives another error :
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.
I modified the code as follows, and it working. but I’m not sure if this way is correct or not
all_agents = []
all_agents.append (Agent (actor_dims, critic_dims))
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()
critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()
target = rewards[:, agent_idx] + agent.gamma * critic_value_
agent.critic_loss= F.mse_loss (critic_value.float (), target.float ())
agent.critic_loss.backward (retain_graph=True)
for agent_idx, agent in enumerate (all_agents):
agent.critic.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
agent.critic.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
i = agent.agent_label
agent.actor_loss = agent.critic.forward (states[i], mu_cluster[i], typ).flatten ()
agent.actor_loss = -T.mean (agent.actor_loss)
agent.actor_loss.backward (retain_graph=True)
for agent_idx, agent in enumerate (all_agents):
agent.actor.optimizer.step ()
# agent.actor.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
agent.actor.optimizer.zero_grad ()
I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb