Hi, I am seeing a strange behavior related to tensor unpacking and leaf variables being used for in-place ops. Please see my minimal reproducible example below. I would have thought that make_xy_v1 and make_xy_v2 are equivalent creation patterns, but they are giving rise to different behavior. Can someone explain why these are different? My torch.__version__ is ‘1.0.1.post2’, installed via conda. I am running Ubuntu 16.04. Thanks!
import torch
def make_xy_v1(N):
x = torch.zeros((N,))
y = torch.zeros((N,))
return x, y
def make_xy_v2(N):
x, y = torch.zeros((2, N))
return x, y
def demo(make_xy_fn):
x, y = make_xy_fn(N=5)
inds = [0]
vals = torch.tensor((1.,), requires_grad=True)
x[inds] = vals
y[inds] = vals
demo(make_xy_v1)
">>> Executes fine."
demo(make_xy_v2)
">>> RuntimeError: a leaf Variable that requires grad has been used in an in-place operation."
Note: If I cast inds to tuple, it seems to trigger a different indexing strategy and no error is thrown.
x and y are actually only views of the original big Tensor.
And so modifying x or y inplace changes the big tensor inplace.
The first line modifies x inplace and makes the big tensor require gradients as vals require gradients.
The second line modifies y inplace and thus the big tensor that is now needed and has been modifies inplace.
Yes, you are right. I just tried the following and it seems weird to me from a design POV.
y = torch.tensor([0., 1., 2.], requires_grad=True) y[(1,)] = 0. is allowed. y[[1,]] = 0. throws “RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.”
Is there good reason for this? They are both in-place modifying a leaf variable requiring grad.
The tuple indexing means that we know exactly which part of the Tensor you’re using: this one entry.
The list slicing can use arbitrary part of the Tensor as the slicing can be given any number of indices.
In your particular case, I agree that the result is the same but the autograd engine is over-restrictive. If he doesn’t know for sure that something is valid, it will raise an error instead of having something possibly wrong.
Thanks for you explanation, but the 'arbitrary part ’ doesn’t make sense to me. In my shallow view, return for indexing and slicing are both related to the ‘index’ and ‘slicing index’ we give, so what does the ‘arbitrary’ means?
It means that when slicing, you can ask for indices 1 3 4 for example, which cannot be represented by only a different view of the original Tensor. And so the exact same Tensor must be modified inplace directly.
If you only ask for just 1, then you can return a new Tensor that share storage with the original.
So the two will have different behaviours.