Confusing about autograd mechanism with pytorch1.8.0

Hi, I’m recently update my pytorch from 1.4.0 to 1.8.0. However, when i running my code without any changes, an error occurs:

Traceback (most recent call last):
  File "Loss.py", line 768, in <module>
    lossB.backward()
  File "/Users/opt/anaconda3/envs/torch_18/lib/python3.7/site-packages/torch/tensor.py", line 233, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Users/opt/anaconda3/envs/torch_18/lib/python3.7/site-packages/torch/autograd/__init__.py", line 146, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

So i using torch.autograd.set_detect_anomaly(True) and i get this:

/Users/opt/anaconda3/envs/torch_18/lib/python3.7/site-packages/torch/autograd/__init__.py:146: UserWarning: Error detected in MseLossBackward. Traceback of forward call that caused the error:
  File "Loss.py", line 755, in <module>
    net_param2=list(modelB.parameters()))
  File "Loss.py", line 335, in loss_cocorrecting_plus
    loss_net = self._net_loss(net_param1, net_param2)
  File "Loss.py", line 105, in _net_loss
    loss += torch.nn.functional.mse_loss(param1, param2)
  File "/Users/opt/anaconda3/envs/torch_18/lib/python3.7/site-packages/torch/nn/functional.py", line 2631, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (Triggered internally at  /Users/distiller/project/conda/conda-bld/pytorch_1607242180650/work/torch/csrc/autograd/python_anomaly_mode.cpp:104.)
  allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
Traceback (most recent call last):
  File "Loss.py", line 768, in <module>
    lossB.backward()
  File "/Users/opt/anaconda3/envs/torch_18/lib/python3.7/site-packages/torch/tensor.py", line 233, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/Users/opt/anaconda3/envs/torch_18/lib/python3.7/site-packages/torch/autograd/__init__.py", line 146, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [16]] is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Here is a simplified version of my code:

def cal_net_dis(net1_param, net2_param):
  loss = 0
  for param1, param2 in zip(net1_param, net2_param):
    loss += torch.nn.functional.mse_loss(param1, param2)
	return loss

modelA = CNN()
modelB = CNN()
optimizerA = torch.optim.SGD(modelA.parameters(), lr=0.01)
optimizerB = torch.optim.SGD(modelB.parameters(), lr=0.01)

for img, target in dataloader:
	outputA = modelA(img)
  outputB = modelB(img)

  lossA_ = F.cross_entropy(img, target)
  lossB_ = F.cross_entropy(img, target)
  net_dis = cal_net_dis(list(modelA.parameters()), list(modelB.parameters())
  lossA = lossA_ + net_dis
  lossB = lossB_ + net_dis

  optimizerA.zero_grad()
  lossA.backward(retain_graph=True)
  optimizerA.step()
  optimizerB.zero_grad()
  lossB.backward()
  optimizerB.step()

when i change the position of optimizerA.step(), this error disappeared:

optimizerA.zero_grad()
lossA.backward(retain_graph=True)
optimizerB.zero_grad()
lossB.backward()
optimizerA.step()
optimizerB.step()

I am very confused, is this operation safe? Or is there a recommended method of such operation? Or maintain the current code?

Hi,

The issue is that old versions of pytorch were not considering the optimizer step as an inplace operation properly. And so this check was not working properly and was not raising an error even though it was computing wrong gradients. This has been fixed in latest master and this error is expected in this case.

The problem being that you need the value of the parameters to compute the backward but optimizer.step() modifies them inplace. So you will need to wait for all the backward to be done before doing the step. Or you need to redo the forward after the step to use the new value of the weights.

1 Like

Thanks, it seems i should check my previous result.