I have a neural network:
ANN(x) that is a function of x but also has parameters to optimize. I need to calculate the derivative of it with respect to x then I would like to gradient descent and optimize
dANN(x)dx. This requires taking the derivative of
dANN(x)/dx with respect to the parameters of
I can do this with the autograd
jacobian function but that’s really slow. I would like to do this with the faster grad or backward functions and
torch.autograd.Function but I’m not sure how to use the
backward function so that each parameter in the
dANN(x)dx gets the correct gradient, especially since the input to the
forward function is a list of parameters (ie
ANN.paramaeters()) each which should receive their own gradient.
For example I have:
class derivativeANN(torch.autograd.Function): @staticmethod def forward(ctx, input): tempstate2 = tempstate.detach().requires_grad_(True) ans = model(tempstate2.reshape(-1,8))[:,1] ds = grad(ans, tempstate2, grad_outputs=torch.ones((batch,)).to(device), retain_graph = True) ctx.save_for_backward(input) ctx.save_for_backward(ds) ctx.save_for_backward(model.parameters()) return ds[:,:2] @staticmethod def backward(ctx, grad_output): input, ds, params12 = ctx.saved_tensors dspar =  for each in input: dsdpar.append(grad(ds, each, grad_outputs=torch.ones_like(ds).to(device)))
But now I need each element of dspar to update the grade of each element of the parameter list of
list(model.parameter()). I’m sure there is a way to optimize
torch.autograd.Function with multiple inputs, I just don’t know how.