Loss with LBFGS not going down

Hi,

I was trying to use the L-BFGS to familiarize a bit. My goal would be to use it in a bigger code base.
Specifically I have a simple cost function $f(x_1,…,x_n) = x_1^2 + … + x_n^2$ which I am trying to optimize.

The code is here:

#include <torch/torch.h>

#include <fstream>
#include <iostream>

// Model: Sum of squares function with learnable parameters
struct Model : torch::nn::Module {
  torch::Tensor param;

  Model(int n) { param = register_parameter("param", torch::rand({n})); }

  torch::Tensor forward() { return torch::sum(torch::pow(param, 2)); }
};

std::tuple<float, size_t> queryLearningRate() {
  std::cout << "Please insert your learning rate as fp number: ";
  float learning_rate;
  std::cin >> learning_rate;

  std::cout << "Please insert the number of epochs: ";
  size_t epoch;
  std::cin >> epoch;

  return {learning_rate, epoch};
}

int main() {
  const int dim = 10;  // Number of parameters

  auto [learning_rate, num_epochs] = queryLearningRate();

  auto model = std::make_shared<Model>(dim);

  // Configure LBFGS optimizer
  torch::optim::LBFGSOptions options(learning_rate);
  options.max_iter(20);             // Number of iterations per .step()
  options.tolerance_grad(1e-10);    // Gradient threshold
  options.tolerance_change(1e-12);  // Function value threshold
  options.history_size(100);  // Optional: more memory for curvature approx
  torch::optim::LBFGS optimizer(model->parameters(), options);

  std::ofstream log_file("lbfgs_loss_log.csv");
  log_file << "epoch,loss\n";

  torch::Tensor loss;

  for (int epoch = 0; epoch < num_epochs; ++epoch) {
    optimizer.zero_grad();

    auto closure = [&]() -> torch::Tensor {
      loss = model->forward();  // scalar loss
      loss.backward();
      return loss;
    };

    optimizer.step(closure);

    std::cout << "Epoch " << epoch << ", Loss: " << loss.item<float>() << "\n";
    log_file << epoch << "," << loss.item<float>() << "\n";
  }

  log_file.close();

  std::cout << "Training complete.\n";
  std::cout << "Final params: " << model->param << "\n";

  return 0;
}

If you run this code and give l_r = 1 and num_epocs = 1000 for example you will see the loss increasing till it gets a NaN. If instead you give l_r < 1 and num_epocs = 1000 the loss won’t change at all.

As example…

(base) root@ba7379ea9923:/TestPyTorch/build# ./SimpleLBFGS/SimpleLBFGS
Please insert your learning rate as fp number: 1e-3
Please insert the number of epochs: 1000
Epoch 0, Loss: 1.65519
Epoch 1, Loss: 1.47941
Epoch 2, Loss: 1.47928
Epoch 3, Loss: 1.47928
Epoch 4, Loss: 1.47928
Epoch 5, Loss: 1.47928
Epoch 6, Loss: 1.47928
Epoch 7, Loss: 1.47928
Epoch 8, Loss: 1.47928
Epoch 9, Loss: 1.47928
Epoch 10, Loss: 1.47928
Epoch 11, Loss: 1.47928
Epoch 12, Loss: 1.47928
Epoch 13, Loss: 1.47928
Epoch 14, Loss: 1.47928
Epoch 15, Loss: 1.47928
Epoch 16, Loss: 1.47928
Epoch 17, Loss: 1.47928
Epoch 18, Loss: 1.47928
Epoch 19, Loss: 1.47928
Epoch 20, Loss: 1.47928
Epoch 21, Loss: 1.47928
Epoch 22, Loss: 1.47928
Epoch 23, Loss: 1.47928
Epoch 24, Loss: 1.47928
Epoch 25, Loss: 1.47928
Epoch 26, Loss: 1.47928
Epoch 27, Loss: 1.47928
Epoch 28, Loss: 1.47928
Epoch 29, Loss: 1.47928
Epoch 30, Loss: 1.47928
Epoch 31, Loss: 1.47928
Epoch 32, Loss: 1.47928
Epoch 33, Loss: 1.47928
Epoch 34, Loss: 1.47928
Epoch 35, Loss: 1.47928
Epoch 36, Loss: 1.47928
Epoch 37, Loss: 1.47928
Epoch 38, Loss: 1.47928
Epoch 39, Loss: 1.47928
Epoch 40, Loss: 1.47928
Epoch 41, Loss: 1.47928
Epoch 42, Loss: 1.47928
Epoch 43, Loss: 1.47928
Epoch 44, Loss: 1.47928
Epoch 45, Loss: 1.47928
Epoch 46, Loss: 1.47928
Epoch 47, Loss: 1.47928
Epoch 48, Loss: 1.47928
Epoch 49, Loss: 1.47928
Epoch 50, Loss: 1.47928
Epoch 51, Loss: 1.47928
Epoch 52, Loss: 1.47928
Epoch 53, Loss: 1.47928
Epoch 54, Loss: 1.47928
Epoch 55, Loss: 1.47928
Epoch 56, Loss: 1.47928
Epoch 57, Loss: 1.47928
Epoch 58, Loss: 1.47928
Epoch 59, Loss: 1.47928
Epoch 60, Loss: 1.47928
Epoch 61, Loss: 1.47928
...

I wonder therefore if it’s me not using it right or if it’s just a bug.
I’ve also tried to step into the code when the step method is called, but the code is quite a bit for me now to understand what’s going on.

can you help me to understand how to make it work? I can see other people used it successfully so I am assuming I am doing something wrong.

Versions

Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy==1.11.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.7.0
[pip3] torch==2.8.0a0+git48807d5
[conda] _anaconda_depends         2024.10             py312_mkl_0  
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h213fc3f_46344  
[conda] mkl-service               2.4.0           py312h5eee18b_1  
[conda] mkl_fft                   1.3.10          py312h5eee18b_0  
[conda] mkl_random                1.2.7           py312h526ad5a_0  
[conda] numpy                     1.26.4          py312hc5e2394_0  
[conda] numpy-base                1.26.4          py312h0da6c21_0  
[conda] numpydoc                  1.7.0           py312h06a4308_0  
[conda] torch                     2.8.0a0+git48807d5           dev_0    <develop>