There are a Leverberg-Marquardt like optimizer for pytorch?

I have a non linear regression NN, i wanted know if there are a optimizer Leverberg-Marquardt like that can i use in my case?

N, D_in, H, D_out = x.shape[0], x.shape[1], 6, y.shape[1]

model = nn.Sequential(OrderedDict([ ('fc1', nn.Linear(D_in, H)), 
                                    #('Sig', nn.Sigmoid()),
                                    ('ISRU', ISRU()), # Add ISRU
                                    ('fc2', nn.Linear(H, D_out))]))

# Error -----
loss_fn = torch.nn.L1Loss(reduction='mean')

# Train -----
optimizer = *****

I tried Googling it and I couldn’t find any implementations of that optimizer for PyTorch. Most of what exists is variations on first-order gradient descent. If your gradients are not stochastic you might try to use torch.optim’s implementation of the second-order optimizer L-BFGS (be sure to set line_search_fn='strong_wolfe' or you risk the optimizer ‘blowing up’ due to accepting a step which increases the loss).

an Example for train loop with this optimizer?

Here’s an example of minimizing the Rosenbrock function with L-BFGS:

from functools import partial

import torch
from torch import optim


def rosenbrock(x):
    return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2


a = torch.tensor([1., 1.]) + torch.rand(2)
a.requires_grad_()
loss = partial(rosenbrock, a)


def print_iter(i, a, loss):
    print(f'{i} [{a[0]:.6f}, {a[1]:.6f}], loss: {loss:.6f}')


opt = optim.LBFGS([a], line_search_fn='strong_wolfe')
print_iter(0, a, loss())
for i in range(200):
    opt.zero_grad()
    loss().backward()
    opt.step(loss)
    print_iter(i+1, a, loss())

in the case i want use nn.MSELoss, who to change this axample ?

I know this thread is a bit old, but for anyone still looking for a Levenberg-Marquardt implementation in PyTorch, I’ve developed one: torch-levenberg-marquardt. Hope it helps anyone who comes across this!

Hi Fabio. In this link Adding Levenberg-marquardt optimizer in PyTorch · Issue #83529 · pytorch/pytorch says that implementing pytorch tensors is very slow for training. Do you think that your implementation solved that problem?

Hi Robert, I think my implementation doesn’t suffer from the issues mentioned in that discussion. Using jacrev + vmap to compute the Jacobian in PyTorch is fast and memory-efficient, especially compared to other methods I’ve tried in both PyTorch and TensorFlow.
Of course, you cannot expect to train models with billions of parameters using LM, but for certain architectures, I’ve been able to train models with millions of parameters on a GPU.