How can I obtain the gradient of a gradient?

Say I have a function f_w(x) with input x and parameters w.

For optimizing it I obtain the gradients of a custom loss function g_q(y) parametrized by q with respect to w. Let’s call it w.grad .

Next I want to obtain the gradients of w.grad with respect to the parameters q of the loss function.

How can I do it? See some code below.

class MyFun:
    def __init__(self, d_input, d_output, constant):
        self.W = constant * torch.ones((d_input, d_output), requires_grad=True)
        
    def fp(self, x):
        self.W.retain_grad()
        return x @ self.W     

f = MyFun(1, 1, torch.tensor([2.]))
g = MyFun(1, 1, torch.tensor([5.]))

x = torch.tensor([8.])

out = f.fp(x)
loss = g.fp(out)

loss.backward(retain_graph=True)

print(x.grad, f.W.grad, g.W.grad)

gradient = f.W.grad

gradient.backward()

The print statement returns “None tensor([[40.]]) tensor([[16.]])” as it should.

gradient.backward() gives me the error “RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn”.

I don’t know if this question is still relevant, but I included some recursive gradient examples in my post error-by-recursively-calling-jacobian-in-a-for-loop