Leaf variable was used in an inplace operation

OswinLG · February 2, 2017, 11:31am

I code a function which implements some operations including torch.mm, torch.index_select and torch.cc. However, there comes out an AssertionError, leaf variable was used in an inplace operation.

In the source code of Variable.py(line 199), I found the assertion, assert self.__version == 0. But it’s not clear to say what is going wrong here. Could anyone help me on this?

colesbury · February 2, 2017, 4:34pm

Loosely, tensors you create directly are leaf variables. Tensors that are the result of a differentiable operation are not leaf variables

For example:

w = torch.tensor([1.0, 2.0, 3.0]) # leaf variable
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) # also leaf variable
y = x + 1  # not a leaf variable

(The PyTorch documentation for is_leaf contains a more precise definition.)

An in-place operation is something which modifies the data of a variable. For example:

x += 1  # in-place
y = x + 1 # not in place

PyTorch doesn’t allow in-place operations on leaf variables that have requires_grad=True (such as parameters of your model) because the developers could not decide how such an operation should behave. If you want the operation to be differentiable, you can work around the limitation by cloning the leaf variable (or use a non-inplace version of the operator).

x2 = x.clone()  # clone the variable
x2 += 1  # in-place operation

If you don’t intend for the operation to be differentiable, you can use torch.no_grad:

with torch.no_grad():
    x += 1

henrye · July 26, 2017, 11:13am

Is it bad practice to get around pytorch disallowing in-place operations by assigning to Variable().data?

tom · July 26, 2017, 12:30pm

That depends on the situation.
For example to initialize or update parameters, assigning to .data is the way to go. Usually, you cannot backprop when changing Variables’ .data in the middle of a forward pass…
I’d summerise my experience as “don’t do it unless you have reason to believe its the rigght thing”.

Best regards

Thomas

henrye · July 26, 2017, 2:13pm

Yes, I remember now, messing with backprop and autograd was why I was running into problems with in-place assignment before. Using .data as I am currently for initialising word embeddings seems ok then.

Yoni_Keren · March 22, 2018, 12:16pm

Hi,

Trying to read the code for optim (I want to implement something a bit differently) and your previous example/explanation of what is a leaf Variable doesn’t seem to be valid anymore.

In particular,
you wrote

y = x + 1  # not a leaf variable

Well, here’s output from my termial for the code which you have mentioned:

>>> x = torch.autograd.Variable(torch.Tensor([1, 2, 3, 4]))
>>> x.is_leaf
True
>>> y = x + 1
>>> y.is_leaf
True
>>> y
Variable containing:
 2
 3
 4
 5
[torch.FloatTensor of size (4,)]

So, can someone please explain what is a leaf Variable, and what is not a leaf variable? Clearly a non-leaf-variable cannot be optimized, but what is it?

Ed_Beeching · April 23, 2018, 1:31pm

Came across a similar issue. Reason is requires grad.

x = torch.autograd.Variable(torch.Tensor([1, 2, 3, 4]), requires_grad=True)
x.is_leaf    
#True
y = x + 1
y.is_leaf
#False

11185 · June 17, 2018, 5:21am

Hi, I used to create leaf variable like:

y = torch.autograd.Variable(torch.zeros([batch_size, c, h, w]), requires_grad=True)

Then I want to assign value to indexed parts of y like below,(y_local is a Variable computed based on other variables and I want to assign the value of y_local to part of the y and ensure that the gradients from y can flow to the y_local.)

y.data[:,:,local_x[i]:local_x[i+1],local_y[i]:local_y[i+1]] = y_local.data

I am wondering such operation supports the normal gradient backward for the y_local varible?

shivangi · June 18, 2018, 3:15pm

I am also facing similar issue. Please let me know how were you able to resolve it

dreamyun · October 11, 2018, 3:08pm

leaf variable, in essence, is a variable, or a tensor with requires_grad=True. So, if a tensor with requires_grad=False, it does not belong to the variable, let alone leaf variable.

klory · October 16, 2018, 3:22am

But y is a variable in this case, it’s not a leaf node.

klory · October 16, 2018, 3:22am

Could you please explain why it’s not correct?

danmoller · December 17, 2018, 8:20pm

That usually isn’t corret… well… what if it IS correct? What should I do?
I’m pretty sure of what I’m doing, but can’t do it on Pytorch.

nima_rafiee · December 27, 2018, 1:44pm

I had a similar case. I used:

y = torch.zeros([batch_size, c, h, w]), requires_grad=False)

then I update the value of y according to the value of the network output and then apply a loss function on y and it worked for me.

pinocchio · June 16, 2020, 9:51pm

Is the reason this is not “usually correct” because we could have just initialized it directly with the data that we wanted in the first place instead of doing a in-place op?

pinocchio · June 16, 2020, 9:55pm

Yoni_Keren:

Hi,

Trying to read the code for optim (I want to implement something a bit differently) and your previous example/explanation of what is a leaf Variable doesn’t seem to be valid anymore.

In particular,
you wrote
y = x + 1  # not a leaf variable
Well, here’s output from my termial for the code which you have mentioned:
>>> x = torch.autograd.Variable(torch.Tensor([1, 2, 3, 4]))
>>> x.is_leaf
True
>>> y = x + 1
>>> y.is_leaf
True
>>> y
Variable containing:
 2
 3
 4
 5
[torch.FloatTensor of size (4,)]
So, can someone please explain what is a leaf Variable, and what is not a leaf variable? Clearly a non-leaf-variable cannot be optimized, but what is it?

@colesbury can you address this question please?

Why is y a leaf when you claim it should not?

I also agree it shouldn’t be a leaf but Pytorch disagrees with us…why?

colesbury · June 16, 2020, 11:53pm

@pinocchio, I’m updating my reply and correcting the example. The not “usually correct” wasn’t a good explanation. The actual reason is that the PyTorch developers could not come to a consensus on reasonable semantics for such an operation.

pinocchio · June 17, 2020, 6:31pm

I think you are wrong, y is indeed not a leaf. Maybe you had a weird version of Pytorch?

def inplace_playground():
    import torch

    x = torch.tensor([1,2,3.], requires_grad=True)
    y = x + 1
    print(f'x.is_leaf = {x.is_leaf}')
    print(f'y.is_leaf = {y.is_leaf}')
    x += 1

output:

x.is_leaf = True
y.is_leaf = False

@colesbury I think you were correct. Not sure what you corrected but I tried the leaf thing and it seems your right that y is not a leaf (as expected).

Thanks!

pinocchio · June 17, 2020, 6:34pm

Why not? What were the competing semantics? What’s the difficulty in defining the semantics for leafs + in-place ops?

jeremysun1224 · July 20, 2020, 5:05pm

Perhaps, you could try adding the following code before updating the gradient:

with torch.no_grad():

and it works:

N, D_in, H, D_out = 64, 1000, 100, 10

X = torch.randn(N, D_in).cuda()
y = torch.randn(N, D_out).cuda()

# device = torch.cuda.device('cuda:0')
device = torch.device('cuda:0')
w1 = torch.randn(D_in, H, requires_grad=True, device=device)
w2 = torch.randn(H, D_out, requires_grad=True, device=device)

learning_rate = 1e-6
for it in range(10):
    # 1.forward pass
    y_pred = X.mm(w1).clamp(min=0).mm(w2).cuda()
    # 2.compute loss
    loss = (y_pred - y).pow(2).sum().cuda()
    print(f'iter {it}, loss {loss}')
    # Backward pass
    loss.backward()
    # update weights of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()