[Solved][Pytorch1.5] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I’m facing the same problem. I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############

all_agents = []

all_agents.append (Agent (actor_dims, critic_dims))

for agent_idx, agent in enumerate (all_agents):
    i = agent.agent_label
    critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()

    critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()

    target = rewards[:, agent_idx] + agent.gamma * critic_value_

    critic_loss= F.mse_loss (critic_value.float (), target.float ())

    agent.critic.optimizer.zero_grad ()
    critic_loss.backward (retain_graph=True)

    actor_loss = agent.critic.forward (states[i], mu_cluster[i]).flatten ()
    actor_loss = -(T.mean (actor_loss))

    agent.actor.optimizer.zero_grad ()
    actor_loss.backward ()

    agent.critic.optimizer.step ()
    agent.actor.optimizer.step ()```
 
#################################

[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):

allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I’m facing the same problem. could you help me please?
I tried the solutions suggested above, but they didn’t work. I have a number “N” of agents, and each agent owns an independent actor and critic. Each agent has different states according to the label given to each agent.
###############

all_agents = []

all_agents.append (Agent (actor_dims, critic_dims))

for agent_idx, agent in enumerate (all_agents):
    i = agent.agent_label
    critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()

    critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()

    target = rewards[:, agent_idx] + agent.gamma * critic_value_

    critic_loss= F.mse_loss (critic_value.float (), target.float ())

    agent.critic.optimizer.zero_grad ()
    critic_loss.backward (retain_graph=True)

    actor_loss = agent.critic.forward (states[i], mu_cluster[i]).flatten ()
    actor_loss = -(T.mean (actor_loss))

    agent.actor.optimizer.zero_grad ()
    actor_loss.backward ()

    agent.critic.optimizer.step ()
    agent.actor.optimizer.step ()```
 
#################################

[W …\torch\csrc\autograd\python_anomaly_mode.cpp:85] Warning: Error detected in AddmmBackward. No forward pass information available. Enable detect anomaly during forward pass for more information. (function _print_stack)
Traceback (most recent call last):

allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 3]], which is output 0 of TBackward, is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Same as before: [Solved][Pytorch1.5] RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation - #37 by ptrblck

Could you explain why retain_graph=True is used?

when I remove retain_graph=True it gives another error :

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

I modified the code as follows, and it working. but I’m not sure if this way is correct or not

all_agents = []

all_agents.append (Agent (actor_dims, critic_dims))

for agent_idx, agent in enumerate (all_agents):
    i = agent.agent_label
    critic_value_ = agent.target_critic.forward (states_[i], new_actions_cluster[i]).flatten ()

    critic_value = agent.critic.forward (states[i], old_actions_cluster[i]).flatten ()

    target = rewards[:, agent_idx] + agent.gamma * critic_value_

    agent.critic_loss= F.mse_loss (critic_value.float (), target.float ())
    agent.critic_loss.backward (retain_graph=True)

for agent_idx, agent in enumerate (all_agents):
    agent.critic.optimizer.zero_grad ()
for agent_idx, agent in enumerate (all_agents):
    agent.critic.optimizer.zero_grad ()

for agent_idx, agent in enumerate (all_agents):
    i = agent.agent_label
    agent.actor_loss = agent.critic.forward (states[i], mu_cluster[i], typ).flatten ()
    agent.actor_loss = -T.mean (agent.actor_loss)
    agent.actor_loss.backward (retain_graph=True)

for agent_idx, agent in enumerate (all_agents):
    agent.actor.optimizer.step ()
    # agent.actor.optimizer.zero_grad ()

for agent_idx, agent in enumerate (all_agents):
    agent.actor.optimizer.zero_grad ()

I met this error when I was doing the PPO (Proximal Policy Optimization). I solve this problem by defining a target network and a main network. The target network at the beginning has the same parameter values with the main network. During the training, the target network parameters are assigned to the main network every constant time steps. The details can be found in the code: https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb