I have a neural network: ANN(x)
that is a function of x but also has parameters to optimize. I need to calculate the derivative of it with respect to x then I would like to gradient descent and optimize dANN(x)dx.
This requires taking the derivative of dANN(x)/dx
with respect to the parameters of ANN(x)
.
I can do this with the autograd jacobian
function but that’s really slow. I would like to do this with the faster grad or backward functions and torch.autograd.Function
but I’m not sure how to use the backward
function so that each parameter in the dANN(x)dx
gets the correct gradient, especially since the input to the forward
function is a list of parameters (ie ANN.paramaeters()
) each which should receive their own gradient.
For example I have:
class derivativeANN(torch.autograd.Function):
@staticmethod
def forward(ctx, input):
tempstate2 = tempstate.detach().requires_grad_(True)
ans = model(tempstate2.reshape(-1,8))[:,1]
ds = grad(ans, tempstate2, grad_outputs=torch.ones((batch,)).to(device), retain_graph = True)
ctx.save_for_backward(input)
ctx.save_for_backward(ds)
ctx.save_for_backward(model.parameters())
return ds[0][:,:2]
@staticmethod
def backward(ctx, grad_output):
input, ds, params12 = ctx.saved_tensors
dspar = []
for each in input:
dsdpar.append(grad(ds, each, grad_outputs=torch.ones_like(ds).to(device)))
But now I need each element of dspar to update the grade of each element of the parameter list of list(model.parameter())
. I’m sure there is a way to optimize torch.autograd.Function
with multiple inputs, I just don’t know how.