For my problem I try to minimize a functional (like a loss function) L: C → R over a Neural Network N which is acting on some data x in X.
So i seek to minimize L(N(x)) for many x by adjusting the parameters of N.
With all sorts of SGD / ADAM my results were not really good so I tried using
torch.optim.LBFGS but this is not working as I want it to (max_eval is not always reached even for very very small tolerances and the result is far away from the solution), so i would like to implement the Newton-Method on my own.
To do this i would like to update the Networks weight on my own:
loss = LossFunctional(model, data) loss.backward() with torch.no_grad(): for p in value_function.parameters(): p -= torch.matmul(torch.linalg.inv(p.grad), loss)
But: this is not working if data is in batch-format because backward() is only allowed for scalar outputs. Therefore also inverting p.grad obviously does not work since its not a square matrix. So my question is:
How can I achieve to get a Jacobian for multiple batches? Is this possible? Is there any other better implementation of a Newton-method solver then doing it by hand or using LBFGS?