Hi,
From this link:
https://discuss.pytorch.org/t/encounter-the-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/836
I learned that sometimes we cannot use in-place operations in the forward because it will yield an error during backpropagation, and it is suggested to use .clone()
before in-place operations. However, I don’t understand well when it is needed to use .clone()
. For instance, in the following code:
import torch
import torch.nn.functional as F
def mut(x, w, mask): return w*x[mask]
# Weights
w1 = torch.ones(5, requires_grad=True)
w2 = torch.ones(3, requires_grad=True)
w3 = torch.ones(3, requires_grad=True)
x = 2*torch.ones(5)
mask = torch.tensor([True, False, True, False, True], dtype=torch.bool)
x = w1*x
x = F.selu(x).clone()
x[mask] = mut(x, w2, mask)
x[mask] = F.selu(x[mask])
x[mask] = mut(x, w3, mask)
x.mean().backward()
I need to add .clone()
after the first SeLU, if I added it in the next line: x[mask] = mut(x.clone(), w2, mask)
it does not work. Why is this?
Also, it seems that I only need to use .clone()
before the first call to mut()
, but not before the second time. Why?
Probably I am missing something. I would be very grateful for an explanation to where and when it is needed to clone.
Thanks a lot,
Mario