There are a Leverberg-Marquardt like optimizer for pytorch?

rerere_L · January 29, 2020, 8:35pm

I have a non linear regression NN, i wanted know if there are a optimizer Leverberg-Marquardt like that can i use in my case?

N, D_in, H, D_out = x.shape[0], x.shape[1], 6, y.shape[1]

model = nn.Sequential(OrderedDict([ ('fc1', nn.Linear(D_in, H)), 
                                    #('Sig', nn.Sigmoid()),
                                    ('ISRU', ISRU()), # Add ISRU
                                    ('fc2', nn.Linear(H, D_out))]))

# Error -----
loss_fn = torch.nn.L1Loss(reduction='mean')

# Train -----
optimizer = *****

crowsonkb · January 29, 2020, 9:29pm

I tried Googling it and I couldn’t find any implementations of that optimizer for PyTorch. Most of what exists is variations on first-order gradient descent. If your gradients are not stochastic you might try to use torch.optim’s implementation of the second-order optimizer L-BFGS (be sure to set line_search_fn='strong_wolfe' or you risk the optimizer ‘blowing up’ due to accepting a step which increases the loss).

rerere_L · January 29, 2020, 10:07pm

an Example for train loop with this optimizer?

crowsonkb · January 30, 2020, 1:47am

Here’s an example of minimizing the Rosenbrock function with L-BFGS:

from functools import partial

import torch
from torch import optim


def rosenbrock(x):
    return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2


a = torch.tensor([1., 1.]) + torch.rand(2)
a.requires_grad_()
loss = partial(rosenbrock, a)


def print_iter(i, a, loss):
    print(f'{i} [{a[0]:.6f}, {a[1]:.6f}], loss: {loss:.6f}')


opt = optim.LBFGS([a], line_search_fn='strong_wolfe')
print_iter(0, a, loss())
for i in range(200):
    opt.zero_grad()
    loss().backward()
    opt.step(loss)
    print_iter(i+1, a, loss())

rerere_L · January 30, 2020, 8:22am

in the case i want use nn.MSELoss, who to change this axample ?

fabiodimarco · December 2, 2024, 3:29am

I know this thread is a bit old, but for anyone still looking for a Levenberg-Marquardt implementation in PyTorch, I’ve developed one: torch-levenberg-marquardt. Hope it helps anyone who comes across this!

Robert_Barrios · January 23, 2025, 9:29pm

Hi Fabio. In this link Adding Levenberg-marquardt optimizer in PyTorch · Issue #83529 · pytorch/pytorch says that implementing pytorch tensors is very slow for training. Do you think that your implementation solved that problem?

fabiodimarco · January 23, 2025, 11:54pm

Hi Robert, I think my implementation doesn’t suffer from the issues mentioned in that discussion. Using jacrev + vmap to compute the Jacobian in PyTorch is fast and memory-efficient, especially compared to other methods I’ve tried in both PyTorch and TensorFlow.
Of course, you cannot expect to train models with billions of parameters using LM, but for certain architectures, I’ve been able to train models with millions of parameters on a GPU.