Example for One of the differentiated Tensors appears to not have been used in the graph

Sudarshan_VB · October 16, 2019, 2:15pm

Can anyone please give an example of this scenario, I am struggling to understand why this could even happen

albanD · October 16, 2019, 3:28pm

Hi,

Here is an example:

a = torch.rand(10, requires_grad=True)
b = torch.rand(10, requires_grad=True)

output = (2 * a).sum()

torch.autograd.grad(output, (a, b))

Sudarshan_VB · October 16, 2019, 4:36pm

Thank you !! yeah it makes sense now

muammar · March 20, 2020, 8:09am

Is there any way to add the b to the graph so that the derivative can be computed?

albanD · March 20, 2020, 2:25pm

Well if b is not in the graph then the derivative is just 0 everywhere. You don’t need to add it to the graph to get the derivatives.

muammar · March 20, 2020, 4:11pm

Thanks for your reply. I asked that because of the following (I hope you could shed some light): I have a model that forward-propagates using features a that were calculated as a function of another vector c without pytorch – I used numpy and then converted a to tensor (that’s why they are not part of the graph, I guess). The output of the model I am working on, a scalar, can be differentiated with respect to c. However, because c is nowhere registered in the computational graph, it will not be possible to compute the derivative of the output with respect to it. Is this correct? And there is no way to compute that gradient with autograd unless c is used somewhere and registered.

albanD · March 20, 2020, 8:12pm

Hi,

You are right that if you don’t use pytorch for some things, you won’t be able to use autograd.
The way to get around this is to create a custom Function (see here how to) that specifies how to compute the backward for a given op. That way, you can wrap the code that the autograd cannot handle in the forward and write the backward by hand there. And then you will be able to use this as any differentiable function in the autograd.

muammar · March 20, 2020, 8:19pm

Thanks for your reply. The link seems very helpful. Hopefully I can extend the functionality as I need.

Sahar_Nasser · October 7, 2020, 5:25am

Thank you for the clarification.so this will give an error unless we set “allow_unused=True”. My question when we have huge number of parameters and some of them are not used in the graph how can we just ignore them instead of using “allow_unused=True” ?
because getting None after using “allow_unused=True” will cause another problem.

albanD · October 7, 2020, 2:19pm

If you already know that these parameters are not used, you can filter the out and not give them to autograd.grad() in the first place. That way, you won’t have to give allow_unused=True.

Sahar_Nasser · October 23, 2020, 7:05pm

I did that, thank you

Tony_Gracious · April 20, 2021, 4:09pm

Do you know why this block of code giving the error “One of the differentiated Tensors appears to not have been used in the graph”?

a = torch.rand(10, requires_grad=True)

output = (2 * a).sum()

torch.autograd.grad(output, a[0])

albanD · April 21, 2021, 2:05pm

Hi,

This is because a[0] is a different Tensor from a. And that Tensor (that you just created) has not been used to compute the output.
You will have to do:

grad_a, = torch.autograd.grad(output, a)
grad_a_0 = grad_a[0]

Tony_Gracious · April 22, 2021, 3:31am

Since a[0] is part of a, why is it considered as a different tensor? Could you also point out some reference materials regarding this?

albanD · April 22, 2021, 1:13pm

Since a[0] is part of a

Well it is not. It is the same as doing a.select(0, 0). And it just returns a new Tensor that shares memory.

You can check our doc about Tensor views for more details: Tensor Views — PyTorch 2.1 documentation

Ellun · May 2, 2021, 3:59am

Hi @albanD,

Based on Tony’s code, suppose we have a batch of data x, and we need to calculate the grad one by one like:

output[i] = fn(x[i])
grad = torch.autograd.grad(output[i], x[i])

Considering the explanations you provided, either we can do

grad_x, = torch.autograd.grad(output[i], x)
grad_x_i = grad_x[i]

or we can do

u = x[i]
output = fn(u)
grad_x_i = torch.autograd.grad(output, u)
grad_x[i] = grad_x_i

Will the latter way be faster than the former one? Since now it seems we need to calculate the input one by one, can the calculation be accelerated if in GPU? Thank you for your time in advance!

albanD · May 3, 2021, 1:11pm

Hi,

Yes both will give the same result.
And indeed the second will be a bit more efficient as you only evaluate on the part of the input you want.

s_mkb · May 20, 2021, 2:13pm

Hello. I have the same error, and I described my question in detail here:Calculating loss with autograd: One of the differentiated Tensors appears to not have been used in the graph

Is this happening because I need to create a custom pytorch function for the curl?

Griffin_Tabor · July 15, 2021, 11:38pm

If it shares memory, then clearly the value of a[0] does have a gradient with respect to output. If I changed a[0] and then recalled loss it would be updated, we could then measure that update with respect to how much I changed a[0]. The very definition of the gradient

import torch
a = torch.rand(10, requires_grad=True)

output = (2 * a).sum()

print(output)
a[0] +=1
output2 = (2 * a).sum()
print(output2)


tensor(11.2303, grad_fn=<SumBackward0>)
tensor(13.2303, grad_fn=<SumBackward0>)

Griffin_Tabor · July 16, 2021, 7:20pm

I guess my example code above does not actually work in a more up to date version of pytorch. It throws an error instead…

Which admittedly weakens my argument that its clear there is a gradient between a[0] and output; but does not seem to satisfy me. If I wanted to know the cost/gradient if a was shifted in a single dimension I would want the above code to work.