Backward error after in place modification only if using Tanh

acobobby · April 24, 2020, 10:21pm

Hello all!
I would like to understand the behavior of backward on a very simple configuration, which is attached below.
I am simply computing gradients from two linear layers after a nonlinearity and some in place operation which sets some items to 0. I think the code speaks for himself.

The call to backward returns a Runtime Error related to in place operation.
However, this error is raised only with Tanh activation function, not with ReLU. I tried different functions and noticed that only the ones without inplace parameter in the signature raise the error (e.g. tanh, sigmoid).
This could be related to the fact that if the gradient does not require the input the in place operation does not raise an error. But ReLU does actually require the input in order to understand if it is larger than zero (derivative=1) or not (derivative=0).

Has someone a good explanation for this behaviour?
Thanks a lot!

P.S. The error can also be fixed by cloning tensor h and modifying the new one instead of the cloned one. But here I am more interested in the explanation of the behavior rather than the fix.

import torch
import torch.nn as nn
from itertools import chain

with torch.autograd.set_detect_anomaly(True):
    l1 = nn.Linear(7, 10)
    l2 = nn.Linear(10, 3)


    criterion = torch.nn.CrossEntropyLoss()
    optim = torch.optim.Adam(chain(l1.parameters(), l2.parameters()))
    optim.zero_grad()

    x = torch.randn(2, 7)
    y = torch.tensor([0,2]).long()
    h = l1(x)

    # tanh trigger Runtime Error
    h = torch.tanh(h)
    # ReLU does not trigger Runtime Error
    # h = torch.relu(h)

    # in-place set to 0 some elements
    h[:, torch.tensor([1,4,6]).long()] = 0.
    out = l2(h)

    loss = criterion(out, y)
    loss.backward()
    optim.step()

EDIT:
The error returned by the runtime is the following:

Warning: Traceback of forward call that caused the error:
  File "test.py", line 18, in <module>
    h = torch.tanh(h)
Traceback (most recent call last):
  File "test.py", line 28, in <module>
    loss.backward()
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2, 10]], which is output 0 of TanhBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Saurabh_Kataria · August 8, 2020, 5:08pm

I faced the same issue even with the latest pytorch1.7.0.a0* : “Warning: Error detected in TanhBackward.”, “RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”. Resolved it by cloning the tensor (as you suggested).