RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 6]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further ab

My code is below:

class Policy_Network(nn.Module):
    def __init__(self,  state_input_dim, action_dim, lr):
        super(Policy_Network, self).__init__()
        self.first_layer = nn.Linear(state_input_dim, 256)
        self.second_layer = nn.Linear(256, 64)
        #self.third_layer = nn.Linear(256, 64)
        self.final_layer = nn.Linear(64, action_dim)
        self.optimizer = optim.Adam(self.parameters(), lr=lr)
        #self.eps = 0.00001

    def forward(self, x):
        x1 = np.array(x)
        x2 = F.relu(self.first_layer(torch.tensor(x1, dtype = torch.float32)))
        x3 = F.relu(self.second_layer(x2))
        x4 = self.final_layer(x3)
        out = F.softmax(x4, dim = 0).unsqueeze(dim=0)

        return out
    def select_action(self, probs):

        m = Categorical(probs)
        action = m.sample()

        return action.item(), m.log_prob(action)
    def update_PFA(self, eps_rewards, eps_log_probs, gamma):
        R = 0
        policy_loss = []
        rewards = []
        for r in eps_rewards[::-1]:
            R =  1 * r + gamma * R
            rewards.insert(0, R)
        rewards = torch.tensor(rewards, dtype = torch.float)
        if rewards.std() == 0:
        if rewards.mean() != 0:
            rewards = (rewards - rewards.mean()) / (rewards.std() )#+ self.eps
        for log_prob, reward in zip(eps_log_probs, rewards):
            #positive beacuse we want to decrease cost
            #not policy_loss.append(-log_prob * reward)
            policy_loss.append( log_prob * reward)


        loss =
        nn.utils.clip_grad_value_(self.parameters(), clip_value=1.0)

Why do I get the inplace error?

The backtrace is as follows:

These issues are often caused by using retain_graph=True in the backward() call. The computation graph will be kept alive and the next iteration tries to recompute the gradient from previous iterations using stale forward activations (assuming a parameter update took place).
Could you explain why you are using this argument?

If I dont do that than I get the following error.

Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I tried to search for the above above issue most people suggest to write the retain_graph = True. I dont understand where the error is in either case

Which is unfortunately most of the time wrong and in this discussion board we usually suggest to either explain why it’s used (as valid use cases certainly exist) or to fix the original issue first.

Based on the first error you could need to check which tensor is reused as you are accumulating to the computation graph. I don’t know how exactly your training loop looks like, but check if e.g. the output of a previous iteration is used as a new input. In this case .detadch() the tensor before starting the new forward pass.