Leaf variable has been moved into the graph interior


(Liangbright) #1

I am using PyTorch 0.4

import torch
X=torch.randn((100,3))
Y=torch.randn((100))
w1=torch.tensor(0.1, requires_grad=True)
w2=torch.tensor(0.1, requires_grad=True)
w3=torch.tensor(0.1, requires_grad=True)
W=torch.tensor([0.1, 0.1, 0.1], requires_grad=True)
W[0]=w1 * w2; W[1]=w2 * w3; W[2]=w3 * w1
#W=torch.cat([w1.view(1),w2.view(1),w3.view(1)])
Yp=torch.sum(X*W, dim=1)
loss = torch.nn.MSELoss()(Yp, Y)
loss.backward()

run the code and I got:
RuntimeError: leaf variable has been moved into the graph interior

uncomment #W, and it is fine

use case:
w1, w2, w3,…, are many many … tensors from outputs of some modules
then we can assemble them to a big tensor via (1) or (2) below:
(1) use torch.cat
(2) create a tensor W, and assign w1, w2, w3 to subsections of W. It is easier to control where the w1/w2/w3 should be put into the W


#2

I don’t think you need to initialize W with data and requires_grad=True, if you are overwriting the values in the next line.

Would this work for you?

w1=torch.tensor(0.1, requires_grad=True)
w2=torch.tensor(0.1, requires_grad=True)
w3=torch.tensor(0.1, requires_grad=True)
W=torch.empty(3, requires_grad=False)
W[0]=w1 * w2; W[1]=w2 * w3; W[2]=w3 * w1
Yp=torch.sum(X*W, dim=1)
loss = torch.nn.MSELoss()(Yp, Y)
loss.backward()
print(w1.grad)

(Liangbright) #3

It works. Thank you!

I’m curious, why torch.empty works ?


#4

torch.empty just creates a tensor with uninitialized values.
Try to print it after the init and you will see, that the values are quite random.

You could also create it with torch.zeros or skip the initialization completely:

w1=torch.tensor([0.1], requires_grad=True)
w2=torch.tensor([0.1], requires_grad=True)
w3=torch.tensor([0.1], requires_grad=True)
W=torch.cat((w1 * w2, w2 * w3, w3 * w1))

(Liangbright) #5

I tried the following, all of them are fine

W=torch.empty(3, requires_grad=False)
W=torch.zeros(3, requires_grad=False)
W=torch.ones(3, requires_grad=False)
W=torch.randn(3, requires_grad=False); W[0]=0.1; W[1]=0.1;W[2]=0.1
W_numpy=numpy.array([0.1,0.1,0.1],dtype=‘float32’); W=torch.from_numpy(W_numpy)
W=torch.arange(0,3)
W=torch.linspace(0,1,3)
W=torch.logspace(-10, 10,3)
W=torch.full((3,), 3.141592)
W=torch.tensor([0.1, 0.1, 0.1])

If I use the option “requires_grad=True” with any of the above expressions, then I got
“RuntimeError: leaf variable has been moved into the graph interior”

Then I run:

W=torch.tensor([0.1, 0.1, 0.1])
W.requires_grad
Out[17]: False
W.is_leaf
Out[18]: True
W[0]=w1 * w2; W[1]=w2 * w3; W[2]=w3 * w1
W.is_leaf
Out[19]: False

W=torch.tensor([0.1, 0.1, 0.1], requires_grad=True)
W.requires_grad
Out[20]: True
W.is_leaf
Out[21]: True
W[0]=w1 * w2; W[1]=w2 * w3; W[2]=w3 * w1
W.is_leaf
Out[22]: False

So, what the error message really means?


#6

You can read a nice explanation of leaf variables in this post.

Usually you have to avoid modifying a leaf variable with an in-place operation.
I haven’t seen your error message yet, but it seems it related to an in-place modification, because you are re-assigning the values of W in this line:

W[0]=w1 * w2; W[1]=w2 * w3; W[2]=w3 * w1

(donkeysaddle) #7

Is that an in-place modification though?


X=torch.randn((100,3))
Y=torch.randn((100))

w1=torch.tensor(0.1, requires_grad=True)
w2=torch.tensor(0.1, requires_grad=True)
w3=torch.tensor(0.1, requires_grad=True)

W=torch.tensor([0.1, 0.1, 0.1])
print(id(W[0]),id(W[1]),id(W[2]))
W[0]=w1 * w2; W[1]=w2 * w3; W[2]=w3 * w1
print(id(W[0]),id(W[1]),id(W[2]))

4778466184 4778466184 4778466184
4778467120 4778467120 4778467120

The addresses are different. In fact, I’m not even sure why each of the elements of the tensor, W, don’t have unique addresses, at least prior to being assigned values based on the w’s. Once tensor elements are assigned values, particularly values based on tensors with requires_grad=True, it seems that they all are given the same address. Is that correct?


#8

I’m not sure which id is shown if you access the tensor, so I would compare the ids of the complete tensor W which stay the same.


(Marwa) #9

I am using PyTorch 0.4.1
I am trying to create a cost-sensitive Loss by modifying the code source of cross entropy loss.
I wrote new Log_Softmax function as described below

def LogSoftmax(input,labels,ksi):
# input = the output layer of my model (nn.Linear(N,nb_classes)
# labels = target classes
# ksi = cost sensitive matrix
layer = torch.empty(input.shape,requires_grad=True)
for j in range(input.shape[0]):
for i in range(input.shape[1]):
layer[j,i] = log(ksi[labels[i],i]*exp(input[j,i])) - log(sum([ksi[labels[i],k]*exp(input[j,k]) for k in range(len(ksi))]))
return layer

input, labels and ski are tensors and have requires_grad set to True
I got error message for loss.backward() during training phase.
For my function’s return ‘layer’:
If requires_grad = True, I got the following error msg: RuntimeError: leaf variable has been moved into the graph interior.
If requires_grad = False, RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.

Thank you


#10

I think your input should require gradients as it’s most likely a result of some other model operation.
If so, I assume you don’t need to set required_grad=True for layer.
Could you post some random inputs for all tensors, so that I could debug it?

Also, I’m not completely sure how your code works, but maybe the nested loop might be avoided using matrix operations.


(Marwa) #11

Thank you for your reply.
My input is in fact the output of my model ie the last fully connected layer (nn.Linear(…,nb_classes)). It works well when using criterion = nn.CrossEntropyLoss(). The problem appears only when using the loss function I have implemented.
If I give my model a batch composed of 2 images for instance, I obtain the following :
input = tensor([[-0.4151, 0.0702, -0.4210, 0.4268, 0.0183, 0.1305, -0.5303, -0.1792,
-0.6745, -0.0481, -0.2302],
[-0.0338, -0.3070, -0.8314, 0.3735, 0.4717, -0.2299, -0.4984, -0.1337,
-0.3467, 0.1117, -0.2985]], grad_fn=<ThAddmmBackward))
labels = tensor([0,10])
ksi = = torch.tensor([[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.],
[1.,1.,1.,1.,1.,1.,1.,1.,1.,1.,1.]],requires_grad = True)
At the moment, my cost sensitive matrix is ones matrix because I just want to debug the code first.
I have slightly modified my LogSoftmax function:
def LogSoftmax(input,labels,ksi):
layer = torch.empty(input.shape)
for j in range(input.shape[0]):
for i in range(input.shape[1]):
layer[j,i] = log(ksi[labels[j],i]*exp(input[j,i])) - log(sum([ksi[labels[j],k]*exp(input[j,k]) for k in range(len(ksi))]))
return layer


(Marwa) #12

I have resolved the problem this way

def LogSoftmax(input,labels,ksi):
layer = torch.autograd.Variable(torch.empty(input.shape), requires_grad = True)
layer1 = layer.clone()
for j in range(input.shape[0]):
for i in range(input.shape[1]):
layer1[j,i] = torch.log(ksi[labels[j],i]*torch.exp(input[j,i])) - torch.log(sum([ksi[labels[j],k]*torch.exp(input[j,k]) for k in range(len(ksi))]))
return layer1