In-place operation when indexing target Variable with ByteTensor

dpernes · April 2, 2018, 4:46pm

I have a problem where I need to index a Variable that requires gradient using a ByteTensor. However, this seems to be an in-place operation, and so PyTorch throws an error:

RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.

The following piece of code reproduces the error in a very simple way:

import torch
from torch.autograd import Variable

x = torch.LongTensor([0,1,0,1,1,0])
y = Variable(torch.zeros(x.size()), requires_grad=True)

y[x == 0] = Variable(torch.Tensor([-5]))

print('y',y)

Clearly, the intended output would be:

y Variable containing:
-5
 0
-5
 0
 0
-5
[torch.FloatTensor of size (6,)]

As a workaround, we can use an explicit for-loop over the elements of x:

import torch
from torch.autograd import Variable

x = torch.LongTensor([0,1,0,1,1,0])
y = Variable(torch.zeros(x.size()), requires_grad=True)

for i in range(x.size(0)):
  if x[i] == 0:
    y[i] = Variable(torch.Tensor([-5]))

print('y',y)

Here, y[i] = Variable(torch.Tensor([-5])) is not an in-place operation (don’t know why) and therefore it works. However, this for-loop is very inefficient, taking a lot of time if x has a large dimension.

Are there any alternatives? Thanks in advance.

albanD · April 3, 2018, 9:31am

Hi,

Since gradients are needed for y itself, your can’t modify it inplace.
What you can do is clone it before, that way the cloned version will not be a leaf Variable and can thus be changed inplace:

import torch
from torch.autograd import Variable

x = torch.LongTensor([0,1,0,1,1,0])
y = Variable(torch.zeros(x.size()), requires_grad=True)

tmp = y.clone()
tmp[x == 0] = Variable(torch.Tensor([-5]))

print('y',y)
print('tmp',tmp)

dpernes · April 3, 2018, 10:06am

Hi,

Thank you for your reply!
That’s a pretty weird behavior… Just for curiosity, why can’t the leaf variables be changed in-place?

albanD · April 3, 2018, 10:09am

Because a gradient update for a leaf Variable a is (with extra terms if you use fancy optimizer) a = a + lambda * a.grad. As you can see, you need the original value of a to be able to perform this update. This is why you cannot change this value inplace, otherwise, you couldn’t do this operation anymore since you would have lost the original a.

dpernes · April 3, 2018, 10:13am

I thought PyTorch would not perform gradient updates on leaf Variables, but only on model Parameters. Isn’t that right?

albanD · April 3, 2018, 10:27am

Parameter comes from the nn package and is just a convenient tool to define parameters in the context of neural networks.
From the autograd point of view, they are Variable with requires_grad=True (hence they are leaf Variables).

The optim package works with any Variable that has requires_grad=True, so you can use it without using nn (and thus without using Parameters), but it is less convenient for neural network usage.
But for example, you can use pytorch to get numerical value of any mathematical function (even though this is not it’s original purpose). The sample below for example show you how to implement a function that computes numerical gradients for any 1D function:

import torch
from torch import autograd
from torch.autograd import Variable

def square(x):
    return x ** 2
def cube(x):
    return x ** 3

# Implement derivative of fn at point y (fn has to be R -> R)
def derivative(fn, y):
    y.requires_grad = True
    out = fn(y)
    return autograd.grad(out, y)[0]


x = Variable(torch.Tensor([3]))

print("Evaluate square at 3")
print(square(x))
print("Evaluate derivative of square at 3 (should be 2*3)")
print(derivative(square, x))
print("Evaluate derivative of cube at 3 (should be 3*(3)**2")
print(derivative(cube, x))

dpernes · April 3, 2018, 10:59am

Yeah, it makes total sense Thank you!

BruceShakeham · August 22, 2020, 2:54pm

Hi Aldan,

Just curious. The same rule would apply also to non-leaf variables. Then how come it is okay to do in-place operation for non-leaf variables? Or, are we running into any danger for doing in-place operations for non-leaf variables, even if there was no error message? Thanks.

albanD · August 24, 2020, 4:24pm

Hi,

The difference for non-leafs, is that you don’t mind loosing the reference to the original value. Because changing it inplace means that you actually just want the new version.
We do extensive checks to make sure that if you don’t get any error, the gradients are correct. Note that in some cases, you get an error even though we could compute the gradients in theory, but we prefer to be over-restrictive here.
So if you don’t get any error, it works fine