The problem I am trying to solve has the following setup. Let `f(x; theta)`

and `g(y; phi)`

be two neural networks whose outputs are scalar and `theta, phi`

are the parameters I want to learn. `x,y`

are vector valued inputs to these networks. In order to train the parameters, the loss function I have is the following:

The issue I have is that since the loss `L`

already has gradients with respect to input `y`

in the form of `nabla g(y)`

, I am wondering if it’s possible to compute the gradient again with respect to both `theta`

and `phi`

. The tentative code I have is the following but I am not sure if it’ll work:

```
x, y = Variable(torch.randn(1,2), requires_grad= True), Variable(torch.randn(1,2), requires_grad= True)
# Assuming that f_theta and g_phi are predefined neural networks
g_y = g_phi(y)
g_phi(y).backward()
grad_g_y = y.grad
f_grad_g_y = f_theta(grad_g_y)
f_x = f_theta(x)
loss = f_grad_g_y - f_x - torch.sum(y*grad_g_y ,dim=1)
# Computing gradient with respect to theta and phi
loss.backward()
print(theta.grad, phi.grad)
```