Newton-Method / Multidimensional autograd

JosAr · July 2, 2021, 3:17pm

Hi there,
For my problem I try to minimize a functional (like a loss function) L: C → R over a Neural Network N which is acting on some data x in X.
So i seek to minimize L(N(x)) for many x by adjusting the parameters of N.
With all sorts of SGD / ADAM my results were not really good so I tried using torch.optim.LBFGS but this is not working as I want it to (max_eval is not always reached even for very very small tolerances and the result is far away from the solution), so i would like to implement the Newton-Method on my own.

To do this i would like to update the Networks weight on my own:

loss = LossFunctional(model, data)
loss.backward()
with torch.no_grad():
    for p in value_function.parameters():
        p -= torch.matmul(torch.linalg.inv(p.grad), loss)

But: this is not working if data is in batch-format because backward() is only allowed for scalar outputs. Therefore also inverting p.grad obviously does not work since its not a square matrix. So my question is:

How can I achieve to get a Jacobian for multiple batches? Is this possible? Is there any other better implementation of a Newton-method solver then doing it by hand or using LBFGS?