Can anyone please give an example of this scenario, I am struggling to understand why this could even happen
Here is an example:
a = torch.rand(10, requires_grad=True) b = torch.rand(10, requires_grad=True) output = (2 * a).sum() torch.autograd.grad(output, (a, b))
Thank you !! yeah it makes sense now
Is there any way to add the
b to the graph so that the derivative can be computed?
b is not in the graph then the derivative is just
0 everywhere. You don’t need to add it to the graph to get the derivatives.
Thanks for your reply. I asked that because of the following (I hope you could shed some light): I have a model that forward-propagates using features a that were calculated as a function of another vector c without pytorch – I used numpy and then converted a to tensor (that’s why they are not part of the graph, I guess). The output of the model I am working on, a scalar, can be differentiated with respect to c. However, because c is nowhere registered in the computational graph, it will not be possible to compute the derivative of the output with respect to it. Is this correct? And there is no way to compute that gradient with autograd unless c is used somewhere and registered.
You are right that if you don’t use pytorch for some things, you won’t be able to use autograd.
The way to get around this is to create a custom Function (see here how to) that specifies how to compute the backward for a given op. That way, you can wrap the code that the autograd cannot handle in the forward and write the backward by hand there. And then you will be able to use this as any differentiable function in the autograd.
Thanks for your reply. The link seems very helpful. Hopefully I can extend the functionality as I need.