Getting a gradient d(df/dx)/df

Let us define the following real valued functions f and g as follows:

f(x) = y (for instance f(x) = x^2)
g(y) = dy/dx

The objective is to compute dg/dy.

The g(y) is computed using torch.autograd.grad. If I then want to compute dg/dy using torch.autograd.grad again, it fails with the message “RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.”

It seems that y is not part of the computation graph created after calling the first torch.autograd.grad, but I do not understand why. Here is a minimal example to reproduce the problem:

import torch
import torch.autograd as ag

# Input x
x = torch.empty((1, ), dtype=torch.float32).uniform_(-1., 1.)
x.requires_grad = True

# f(x) = y
y = x ** 2.

# g(y) = dy/dx
A = ag.grad(y, x, torch.ones((1, ), dtype=torch.float32), create_graph=True)[0]

# dg / dy - fails
B = ag.grad(A, y, torch.ones((1, ), dtype=torch.float32), create_graph=True)[0]

I assumed that the the computation graph would look as follows:

 /   f         g   \|     dg/dy
x ------> y ------>  A ---------->  B 

But it seems that the branch connecting y to A does not exist (not created by the first call to torch.grad.autograd).

Would anyone have an idea what is going on here and how I could get such a derivative? Thank you!


A is computed as 2 * x. So A only depends on x, not y.
In particular, you might want to change your notation to avoid mixing symbolic variables and values.
If we write
f: x -> y. Then we can evaluate it at a given value x0 to get y0 = f(x0).
But then A = (df/dx)(x0) (the derivative of f evaluated at the point x0).
y0 is not actually an input of g here (it is just an intermediary result), so writing g(y) (or g(y0)) does not really make sense. It is actually g(x0).