I don't find the error

    def update(self, log_prob1, log_prob2,value, reward):
       

        advantage = reward - value

        policy_loss = ((-log_prob1 * advantage) + (-log_prob2 * advantage)).mean()
        value_loss = F.smooth_l1_loss(value, reward)
        loss = policy_loss + value_loss

        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

In loss.backward i have a runtime error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 1]], which is output 0 of SqrtBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Theese variables are torch tensor of 16x1, where is the inplace operation?

I can’t see any inplace operations in your code snippet.
Could you post dummy tensor shapes, so that we could reproduce this issue?

                    rewards = []
                    for a1, a2 in zip(action1, action2):


                        rewards.append(loader[mode].dataset.get_reward(state.item(), a1[0], a1[1], a2[0], a2[1]))

                    rewards = torch.Tensor(rewards).view(-1,1).cuda()#BS x 1
                    log_prob1=log_prob1.view(-1,1)
                    log_prob2 = log_prob2.view(-1, 1)
                    state_value = state_value.view(-1,1)

                    if mode == 'train':
                        loss_to_plot, policy_to_plot, reward_to_plot, value_to_plot = estimator.update(log_prob1,
                                                                                                        log_prob2,
                                                                                                        state_value,
                                                                                                        rewards)



This is the code that call the method above.
All tensors are 16x1, the only thing different is in structure: miss "grad_fn=<ViewBackward> " in tensor reward. I can't see any inplace operation too.
I print these tensors:



tensor([[-2.2734],
[-2.3429],
[-2.3219],
[-1.6636],
[-3.4040],
[-3.1213],
[-4.9740],
[-2.1452],
[-2.5107],
[-4.0225],
[-3.3761],
[-1.9465],
[-3.5561],
[-1.5919],
[-1.8437],
[-2.8795]], device=‘cuda:0’, grad_fn=ViewBackward)
tensor([[-1.7228],
[-2.8122],
[-2.3623],
[-2.4175],
[-1.9202],
[-1.6540],
[-3.5124],
[-1.7577],
[-5.4544],
[-1.6919],
[-3.0612],
[-3.8174],
[-2.7434],
[-4.5231],
[-3.1026],
[-2.8684]], device=‘cuda:0’, grad_fn=ViewBackward)
tensor([[-0.0289],
[ 0.0604],
[ 0.0796],
[-0.0133],
[-0.1071],
[-0.0423],
[-0.0217],
[ 0.1402],
[-0.0927],
[-0.0896],
[ 0.0655],
[ 0.0261],
[ 0.0024],
[ 0.0145],
[-0.0375],
[ 0.0536]], device=‘cuda:0’, grad_fn=ViewBackward)
tensor([[1.0000],
[0.5000],
[0.0000],
[0.5000],
[1.0000],
[0.0000],
[0.0000],
[0.0000],
[1.0000],
[1.0000],
[0.0000],
[0.5000],
[0.5000],
[0.5000],
[0.5000],
[0.0000]], device=‘cuda:0’)



I print respectively in the same order when i pass them