Runtime error in gradient computation

Rashmi_S_Murthy · December 13, 2021, 8:15am

Hi,
When I try to train a model, I face an error.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [20, 256, 1, 1]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Any idea what is causing this?
Thanks.

ptrblck · December 15, 2021, 9:14am

The error is usually raised by disallowed inplace operations, which would manipulate tensors which are needed to compute the gradients in the backward pass.
Here is a small example:

# works
w = nn.Parameter(torch.randn(1, 1))
x = torch.randn(1, 1)

y = w * x
y.backward() # works

# fails
w = nn.Parameter(torch.randn(1, 1))
x = torch.randn(1, 1)

y = w * x
x += 1 # x is needed to compute the gradient wrt w in the multiplication, so you can't manipulate it
y.backward()
# > RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Check for these inplace manipulations and replace them with their out-of-place equivalents.