When use loss.backward, it has bugs. RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

Xuye_Liu · May 16, 2021, 5:44pm

To Reproduce

Steps to reproduce the behavior:

I create a specific model with pytorch when I use loss.backward, it has bugs. But it works when I use cuda gpu. When I try to run my model with cpu, it appears this bug.
I already try to reinstall pytorch version like 1.5.0, 1.6.0, 1.8.0. But it all not works

code:

optimizer = torch.optim.Adamax(net.parameters(), lr = 2e-3)
loss_func = torch.nn.CrossEntropyLoss()
output = net([train1[0], train1[1], train1[2], train1[3]])
loss = loss_func(output, train2)
total_correct += get_accuracy(output, train2)
ml_loss.update(loss, batch_size*30)
acc.update(total_correct, batch_size * 30)
optimizer.zero_grad()
# backward propogation
loss.backward()
# weight optimizer
optimizer.step()

bug:

    loss.backward()
  File "/Users/leo/.pyenv/versions/3.7.7/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/Users/leo/.pyenv/versions/3.7.7/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [300, 300, 300]], which is output 0 of SelectBackward, is at version 16; expected version 15 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Expected behavior

Succeed to running my model script.

Environment

PyTorch Version 1.5.0:
OS (e.g., Linux) Mac:
How you installed PyTorch (conda, pip, source):pip install torch==1.5.0
Build command you used (if compiling from source):
Python version: 3.7.7
CUDA/cuDNN version:cpu
GPU models and configuration:
Any other relevant information:

albanD · May 16, 2021, 5:46pm

Hey

As mentioned in the error message, have you tried setting torch.autograd.set_detect_anomaly(True) and see if you get useful informations from that?