Clone to avoid in-place operations and runtime error during gradient calculation

marioo · April 7, 2021, 10:27am

Hi,

From this link:
https://discuss.pytorch.org/t/encounter-the-runtimeerror-one-of-the-variables-needed-for-gradient-computation-has-been-modified-by-an-inplace-operation/836
I learned that sometimes we cannot use in-place operations in the forward because it will yield an error during backpropagation, and it is suggested to use .clone() before in-place operations. However, I don’t understand well when it is needed to use .clone(). For instance, in the following code:

import torch
import torch.nn.functional as F

def mut(x, w, mask): return w*x[mask]

# Weights
w1 = torch.ones(5, requires_grad=True)
w2 = torch.ones(3, requires_grad=True)
w3 = torch.ones(3, requires_grad=True)

x = 2*torch.ones(5)
mask = torch.tensor([True, False, True, False, True], dtype=torch.bool)


x = w1*x
x = F.selu(x).clone()
x[mask] = mut(x, w2, mask)
x[mask] = F.selu(x[mask])
x[mask] = mut(x, w3, mask)
x.mean().backward()

I need to add .clone() after the first SeLU, if I added it in the next line: x[mask] = mut(x.clone(), w2, mask) it does not work. Why is this?

Also, it seems that I only need to use .clone() before the first call to mut() , but not before the second time. Why?

Probably I am missing something. I would be very grateful for an explanation to where and when it is needed to clone.

Thanks a lot,

Mario

googlebot · April 7, 2021, 6:09pm

let’s disambiguate things first, this is working:
a = F.selu(x)
b = a.clone()
b[mask] = mut(b, w2, mask)
b[mask] = F.selu(b[mask])
b[mask] = mut(b, w3, mask)

your breaking change:
a = F.selu(x)
b = a.clone()
a[mask] = mut(b, w2, mask)
a[mask] = F.selu(a[mask])
a[mask] = mut(a, w3, mask)

This suggests that “a” is referenced in some backward function, that prevents "a[mask] = " ops, but this problematic reference doesn’t exist in the first code with “b”.

Obvious culprit is F.selu itself, that keeps a reference to its output. It is hard to tell when this is the case - autograd code generation manages stored tensors, AFAIK this is not documented.

Relevant entry in tools/autograd/derivatives.yaml:
self: elu_backward(grad, alpha, scale, input_scale, result)

You’re accidentally avoiding problems with mut() because x[mask] in it copies values from x to a new memory. Note that this behaviour is different from “x[mask] = …” in-place assignments, handled by __setitem__.

marioo · April 8, 2021, 8:59am

Understood. Thank you very much for your explanation