def update(self, log_prob1, log_prob2,value, reward):
advantage = reward - value
policy_loss = ((-log_prob1 * advantage) + (-log_prob2 * advantage)).mean()
value_loss = F.smooth_l1_loss(value, reward)
loss = policy_loss + value_loss
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
In loss.backward i have a runtime error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [16, 1]], which is output 0 of SqrtBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Theese variables are torch tensor of 16x1, where is the inplace operation?