Example for One of the differentiated Tensors appears to not have been used in the graph

Can anyone please give an example of this scenario, I am struggling to understand why this could even happen

Hi,

Here is an example:

a = torch.rand(10, requires_grad=True)
b = torch.rand(10, requires_grad=True)

output = (2 * a).sum()

torch.autograd.grad(output, (a, b))
3 Likes

Thank you !! yeah it makes sense now

Is there any way to add the b to the graph so that the derivative can be computed?

Well if b is not in the graph then the derivative is just 0 everywhere. You don’t need to add it to the graph to get the derivatives.

2 Likes

Thanks for your reply. I asked that because of the following (I hope you could shed some light): I have a model that forward-propagates using features a that were calculated as a function of another vector c without pytorch – I used numpy and then converted a to tensor (that’s why they are not part of the graph, I guess). The output of the model I am working on, a scalar, can be differentiated with respect to c. However, because c is nowhere registered in the computational graph, it will not be possible to compute the derivative of the output with respect to it. Is this correct? And there is no way to compute that gradient with autograd unless c is used somewhere and registered.

Hi,

You are right that if you don’t use pytorch for some things, you won’t be able to use autograd.
The way to get around this is to create a custom Function (see here how to) that specifies how to compute the backward for a given op. That way, you can wrap the code that the autograd cannot handle in the forward and write the backward by hand there. And then you will be able to use this as any differentiable function in the autograd.

1 Like

Thanks for your reply. The link seems very helpful. Hopefully I can extend the functionality as I need. :slight_smile:

Thank you for the clarification.so this will give an error unless we set “allow_unused=True”. My question when we have huge number of parameters and some of them are not used in the graph how can we just ignore them instead of using “allow_unused=True” ?
because getting None after using “allow_unused=True” will cause another problem.

If you already know that these parameters are not used, you can filter the out and not give them to autograd.grad() in the first place. That way, you won’t have to give allow_unused=True.

1 Like

I did that, thank you :blush:

Do you know why this block of code giving the error “One of the differentiated Tensors appears to not have been used in the graph”?

a = torch.rand(10, requires_grad=True)

output = (2 * a).sum()

torch.autograd.grad(output, a[0])

Hi,

This is because a[0] is a different Tensor from a. And that Tensor (that you just created) has not been used to compute the output.
You will have to do:

grad_a, = torch.autograd.grad(output, a)
grad_a_0 = grad_a[0]

1 Like

Since a[0] is part of a, why is it considered as a different tensor? Could you also point out some reference materials regarding this?

Since a[0] is part of a

Well it is not. It is the same as doing a.select(0, 0). And it just returns a new Tensor that shares memory.

You can check our doc about Tensor views for more details: Tensor Views — PyTorch 2.1 documentation

Hi @albanD,

Based on Tony’s code, suppose we have a batch of data x, and we need to calculate the grad one by one like:

output[i] = fn(x[i])
grad = torch.autograd.grad(output[i], x[i])

Considering the explanations you provided, either we can do

grad_x, = torch.autograd.grad(output[i], x)
grad_x_i = grad_x[i]

or we can do

u = x[i]
output = fn(u)
grad_x_i = torch.autograd.grad(output, u)
grad_x[i] = grad_x_i 

Will the latter way be faster than the former one? Since now it seems we need to calculate the input one by one, can the calculation be accelerated if in GPU? Thank you for your time in advance!

Hi,

Yes both will give the same result.
And indeed the second will be a bit more efficient as you only evaluate on the part of the input you want.

Hello. I have the same error, and I described my question in detail here:Calculating loss with autograd: One of the differentiated Tensors appears to not have been used in the graph

Is this happening because I need to create a custom pytorch function for the curl?

If it shares memory, then clearly the value of a[0] does have a gradient with respect to output. If I changed a[0] and then recalled loss it would be updated, we could then measure that update with respect to how much I changed a[0]. The very definition of the gradient

import torch
a = torch.rand(10, requires_grad=True)

output = (2 * a).sum()

print(output)
a[0] +=1
output2 = (2 * a).sum()
print(output2)

tensor(11.2303, grad_fn=<SumBackward0>)
tensor(13.2303, grad_fn=<SumBackward0>)

I guess my example code above does not actually work in a more up to date version of pytorch. It throws an error instead…

Which admittedly weakens my argument that its clear there is a gradient between a[0] and output; but does not seem to satisfy me. If I wanted to know the cost/gradient if a was shifted in a single dimension I would want the above code to work.