Say I have a function f_w(x) with input x and parameters w.

For optimizing it I obtain the gradients of a custom loss function g_q(y) parametrized by q with respect to w. Let’s call it w.grad .

Next I want to obtain the gradients of w.grad with respect to the parameters q of the loss function.

How can I do it? See some code below.

```
class MyFun:
def __init__(self, d_input, d_output, constant):
self.W = constant * torch.ones((d_input, d_output), requires_grad=True)
def fp(self, x):
self.W.retain_grad()
return x @ self.W
f = MyFun(1, 1, torch.tensor([2.]))
g = MyFun(1, 1, torch.tensor([5.]))
x = torch.tensor([8.])
out = f.fp(x)
loss = g.fp(out)
loss.backward(retain_graph=True)
print(x.grad, f.W.grad, g.W.grad)
gradient = f.W.grad
gradient.backward()
```

The print statement returns “None tensor([[40.]]) tensor([[16.]])” as it should.

gradient.backward() gives me the error “RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn”.