Can anyone please give an example of this scenario, I am struggling to understand why this could even happen

Hi,

Here is an example:

```
a = torch.rand(10, requires_grad=True)
b = torch.rand(10, requires_grad=True)
output = (2 * a).sum()
torch.autograd.grad(output, (a, b))
```

Thank you !! yeah it makes sense now

Is there any way to add the `b`

to the graph so that the derivative can be computed?

Well if `b`

is not in the graph then the derivative is just `0`

everywhere. You donâ€™t need to add it to the graph to get the derivatives.

Thanks for your reply. I asked that because of the following (I hope you could shed some light): I have a model that forward-propagates using features **a** that were calculated as a function of another vector **c** without pytorch â€“ I used numpy and then converted **a** to tensor (thatâ€™s why they are not part of the graph, I guess). The output of the model I am working on, a scalar, can be differentiated with respect to **c**. However, because **c** is nowhere registered in the computational graph, it will not be possible to compute the derivative of the output with respect to it. Is this correct? And there is no way to compute that gradient with autograd unless **c** is used somewhere and registered.

Hi,

You are right that if you donâ€™t use pytorch for some things, you wonâ€™t be able to use autograd.

The way to get around this is to create a custom Function (see here how to) that specifies how to compute the backward for a given op. That way, you can wrap the code that the autograd cannot handle in the forward and write the backward by hand there. And then you will be able to use this as any differentiable function in the autograd.

Thanks for your reply. The link seems very helpful. Hopefully I can extend the functionality as I need.

Thank you for the clarification.so this will give an error unless we set â€śallow_unused=Trueâ€ť. My question when we have huge number of parameters and some of them are not used in the graph how can we just ignore them instead of using â€śallow_unused=Trueâ€ť ?

because getting None after using â€śallow_unused=Trueâ€ť will cause another problem.

If you already know that these parameters are not used, you can filter the out and not give them to `autograd.grad()`

in the first place. That way, you wonâ€™t have to give allow_unused=True.

I did that, thank you

Do you know why this block of code giving the error â€śOne of the differentiated Tensors appears to not have been used in the graphâ€ť?

```
a = torch.rand(10, requires_grad=True)
output = (2 * a).sum()
torch.autograd.grad(output, a[0])
```

Hi,

This is because `a[0]`

is a different Tensor from `a`

. And that Tensor (that you just created) has not been used to compute the `output`

.

You will have to do:

```
grad_a, = torch.autograd.grad(output, a)
grad_a_0 = grad_a[0]
```

Since a[0] is part of a, why is it considered as a different tensor? Could you also point out some reference materials regarding this?

Since a[0] is part of a

Well it is not. It is the same as doing `a.select(0, 0)`

. And it just returns a new Tensor that shares memory.

You can check our doc about Tensor views for more details: Tensor Views â€” PyTorch 1.8.1 documentation

Hi @albanD,

Based on Tonyâ€™s code, suppose we have a batch of data x, and we need to calculate the grad one by one like:

```
output[i] = fn(x[i])
grad = torch.autograd.grad(output[i], x[i])
```

Considering the explanations you provided, either we can do

```
grad_x, = torch.autograd.grad(output[i], x)
grad_x_i = grad_x[i]
```

or we can do

```
u = x[i]
output = fn(u)
grad_x_i = torch.autograd.grad(output, u)
grad_x[i] = grad_x_i
```

Will the latter way be faster than the former one? Since now it seems we need to calculate the input one by one, can the calculation be accelerated if in GPU? Thank you for your time in advance!

Hi,

Yes both will give the same result.

And indeed the second will be a bit more efficient as you only evaluate on the part of the input you want.

Hello. I have the same error, and I described my question in detail here:Calculating loss with autograd: One of the differentiated Tensors appears to not have been used in the graph

Is this happening because I need to create a custom pytorch function for the curl?

If it shares memory, then clearly the value of a[0] does have a gradient with respect to output. If I changed a[0] and then recalled loss it would be updated, we could then measure that update with respect to how much I changed a[0]. **The very definition of the gradient**

```
import torch
a = torch.rand(10, requires_grad=True)
output = (2 * a).sum()
print(output)
a[0] +=1
output2 = (2 * a).sum()
print(output2)
```

```
tensor(11.2303, grad_fn=<SumBackward0>)
tensor(13.2303, grad_fn=<SumBackward0>)
```

I guess my example code above does not actually work in a more up to date version of pytorch. It throws an error insteadâ€¦

Which admittedly weakens my argument that its clear there is a gradient between a[0] and output; but does not seem to satisfy me. If I wanted to know the cost/gradient if a was shifted in a single dimension I would want the above code to work.