I have a neural network: `ANN(x)`

that is a function of x but also has parameters to optimize. I need to calculate the derivative of it with respect to x then I would like to gradient descent and optimize `dANN(x)dx.`

This requires taking the derivative of `dANN(x)/dx`

with respect to the parameters of `ANN(x)`

.

I can do this with the autograd `jacobian`

function but that’s really slow. I would like to do this with the faster grad or backward functions and `torch.autograd.Function`

but I’m not sure how to use the `backward`

function so that each parameter in the `dANN(x)dx`

gets the correct gradient, especially since the input to the `forward`

function is a list of parameters (ie `ANN.paramaeters()`

) each which should receive their own gradient.

For example I have:

```
class derivativeANN(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
tempstate2 = tempstate.detach().requires_grad_(True)
ans = model(tempstate2.reshape(-1,8))[:,1]
ds = grad(ans, tempstate2, grad_outputs=torch.ones((batch,)).to(device), retain_graph = True)
ctx.save_for_backward(input)
ctx.save_for_backward(ds)
ctx.save_for_backward(model.parameters())
return ds[0][:,:2]
@staticmethod
def backward(ctx, grad_output):
input, ds, params12 = ctx.saved_tensors
dspar = []
for each in input:
dsdpar.append(grad(ds, each, grad_outputs=torch.ones_like(ds).to(device)))
```

But now I need each element of dspar to update the grade of each element of the parameter list of `list(model.parameter())`

. I’m sure there is a way to optimize `torch.autograd.Function`

with multiple inputs, I just don’t know how.