RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation. AddBackward0

ZdsAlpha · March 21, 2020, 12:25pm

I am getting this error because of the following code.
"RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8, 16, 240, 427]], which is output 0 of AddBackward0, is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)."

    def forward(self, x):
        x = self.activation(self.in_conv(x))
        for i, conv in enumerate(self.mid_conv):
            x += self.activation(conv(x))
        return self.out_conv(x)

if I change the code into this it works fine:

    def forward(self, x):
        x = self.activation(self.in_conv(x))
        for i, conv in enumerate(self.mid_conv):
            x = self.activation(conv(x))
        return self.out_conv(x)

Here self.activation is F.relu, self.mid_conv is nn.Sequential

ptrblck · March 22, 2020, 5:01am

As the error states, you’ve modified a variable by an inplace operation by using x += ....
If you want to sum the conv output to x, you should write out the operation as:

x = x + self.activation(conv(x))

JosueCom · August 8, 2021, 4:44pm

I got a similar error and I have no idea what it mean? Any insights? I am not using an inplace operation like +=. After following the hint, I traced it to return th.stack(neighs).sum(dim = 0) where neighs is a list of 1D tensors.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [50]], which is output 0 of AsStridedBackward, is at version 
2256; expected version 2255 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

It seems that adding values one by one in a for loop works. But, this is not a clean solution.

ptrblck · August 8, 2021, 10:07pm

Have you tried enabling anomaly detection as suggested in the error message and if so, was the stack trace pointing to torch.stack?
This would be unusual, as torch.stack wouldn’t modify the tensors inplace assuming neighs is e.g. a list of separate tensors. In case you get stuck, could you try to come up with a minimal executable code snippet so that we could try to debug it?

JosueCom · August 9, 2021, 2:53am

I enabled anomaly detection and it indicated torch.stack. I was using cuda and mixed precision training at the same time before. I think it had to do something with it.

I tried to create a simple problem but I cannot cause the error to appear again. However, in the bigger problem, I have fixed the issue by restructuring my code so I can set neighs to all_neighbors[[indexes of local neighbors]].

This is the full output:

[W ..\torch\csrc\autograd\python_anomaly_mode.cpp:60] Warning: Error detected in StackBackward. Traceback of forward call that caused the error:
  File ".\cora.py", line 127, in <module>
    sub_graph_size=SUBGRPAH_SIZE)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\lign\train.py", line 126, in superv
    out = base(full_graph, inp) if is_base_gcn else base(inp)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\lign\models\CORA.py", line 20, in forward
    x = F.relu(self.unit3(g, x))
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\torch\nn\modules\module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\lign\nn.py", line 19, in forward
    g.push(func = self.aggregation, data = "__hidden__")
    out = func(out)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\lign\utils\functions.py", line 71, in sum_tensors
    return th.stack(neighs).sum(dim = 0)
 (function print_stack)
Traceback (most recent call last):
  File ".\cora.py", line 127, in <module>
    sub_graph_size=SUBGRPAH_SIZE)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\lign\train.py", line 133, in superv
    scaler.scale(loss).backward()
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\torch\tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Users\josue\anaconda3\envs\LIGN\lib\site-packages\torch\autograd\__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.HalfTensor [50]], which is output 0 of AsStridedBackward, is at version 
2256; expected version 2255 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

If you still want to see the full project, I have it on Github. It is an older commit now. To produce the error, run performance/cora.py while inside the performance directory.

jmjkx111 · March 7, 2022, 8:59am

Thank you very much.
You are pytorch god.