Hi,
I was trying to use the L-BFGS to familiarize a bit. My goal would be to use it in a bigger code base.
Specifically I have a simple cost function $f(x_1,…,x_n) = x_1^2 + … + x_n^2$ which I am trying to optimize.
The code is here:
#include <torch/torch.h>
#include <fstream>
#include <iostream>
// Model: Sum of squares function with learnable parameters
struct Model : torch::nn::Module {
torch::Tensor param;
Model(int n) { param = register_parameter("param", torch::rand({n})); }
torch::Tensor forward() { return torch::sum(torch::pow(param, 2)); }
};
std::tuple<float, size_t> queryLearningRate() {
std::cout << "Please insert your learning rate as fp number: ";
float learning_rate;
std::cin >> learning_rate;
std::cout << "Please insert the number of epochs: ";
size_t epoch;
std::cin >> epoch;
return {learning_rate, epoch};
}
int main() {
const int dim = 10; // Number of parameters
auto [learning_rate, num_epochs] = queryLearningRate();
auto model = std::make_shared<Model>(dim);
// Configure LBFGS optimizer
torch::optim::LBFGSOptions options(learning_rate);
options.max_iter(20); // Number of iterations per .step()
options.tolerance_grad(1e-10); // Gradient threshold
options.tolerance_change(1e-12); // Function value threshold
options.history_size(100); // Optional: more memory for curvature approx
torch::optim::LBFGS optimizer(model->parameters(), options);
std::ofstream log_file("lbfgs_loss_log.csv");
log_file << "epoch,loss\n";
torch::Tensor loss;
for (int epoch = 0; epoch < num_epochs; ++epoch) {
optimizer.zero_grad();
auto closure = [&]() -> torch::Tensor {
loss = model->forward(); // scalar loss
loss.backward();
return loss;
};
optimizer.step(closure);
std::cout << "Epoch " << epoch << ", Loss: " << loss.item<float>() << "\n";
log_file << epoch << "," << loss.item<float>() << "\n";
}
log_file.close();
std::cout << "Training complete.\n";
std::cout << "Final params: " << model->param << "\n";
return 0;
}
If you run this code and give l_r = 1
and num_epocs = 1000
for example you will see the loss increasing till it gets a NaN
. If instead you give l_r < 1
and num_epocs = 1000
the loss won’t change at all.
As example…
(base) root@ba7379ea9923:/TestPyTorch/build# ./SimpleLBFGS/SimpleLBFGS
Please insert your learning rate as fp number: 1e-3
Please insert the number of epochs: 1000
Epoch 0, Loss: 1.65519
Epoch 1, Loss: 1.47941
Epoch 2, Loss: 1.47928
Epoch 3, Loss: 1.47928
Epoch 4, Loss: 1.47928
Epoch 5, Loss: 1.47928
Epoch 6, Loss: 1.47928
Epoch 7, Loss: 1.47928
Epoch 8, Loss: 1.47928
Epoch 9, Loss: 1.47928
Epoch 10, Loss: 1.47928
Epoch 11, Loss: 1.47928
Epoch 12, Loss: 1.47928
Epoch 13, Loss: 1.47928
Epoch 14, Loss: 1.47928
Epoch 15, Loss: 1.47928
Epoch 16, Loss: 1.47928
Epoch 17, Loss: 1.47928
Epoch 18, Loss: 1.47928
Epoch 19, Loss: 1.47928
Epoch 20, Loss: 1.47928
Epoch 21, Loss: 1.47928
Epoch 22, Loss: 1.47928
Epoch 23, Loss: 1.47928
Epoch 24, Loss: 1.47928
Epoch 25, Loss: 1.47928
Epoch 26, Loss: 1.47928
Epoch 27, Loss: 1.47928
Epoch 28, Loss: 1.47928
Epoch 29, Loss: 1.47928
Epoch 30, Loss: 1.47928
Epoch 31, Loss: 1.47928
Epoch 32, Loss: 1.47928
Epoch 33, Loss: 1.47928
Epoch 34, Loss: 1.47928
Epoch 35, Loss: 1.47928
Epoch 36, Loss: 1.47928
Epoch 37, Loss: 1.47928
Epoch 38, Loss: 1.47928
Epoch 39, Loss: 1.47928
Epoch 40, Loss: 1.47928
Epoch 41, Loss: 1.47928
Epoch 42, Loss: 1.47928
Epoch 43, Loss: 1.47928
Epoch 44, Loss: 1.47928
Epoch 45, Loss: 1.47928
Epoch 46, Loss: 1.47928
Epoch 47, Loss: 1.47928
Epoch 48, Loss: 1.47928
Epoch 49, Loss: 1.47928
Epoch 50, Loss: 1.47928
Epoch 51, Loss: 1.47928
Epoch 52, Loss: 1.47928
Epoch 53, Loss: 1.47928
Epoch 54, Loss: 1.47928
Epoch 55, Loss: 1.47928
Epoch 56, Loss: 1.47928
Epoch 57, Loss: 1.47928
Epoch 58, Loss: 1.47928
Epoch 59, Loss: 1.47928
Epoch 60, Loss: 1.47928
Epoch 61, Loss: 1.47928
...
I wonder therefore if it’s me not using it right or if it’s just a bug.
I’ve also tried to step into the code when the step
method is called, but the code is quite a bit for me now to understand what’s going on.
can you help me to understand how to make it work? I can see other people used it successfully so I am assuming I am doing something wrong.
Versions
Versions of relevant libraries:
[pip3] flake8==7.0.0
[pip3] mypy==1.11.2
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.7.0
[pip3] torch==2.8.0a0+git48807d5
[conda] _anaconda_depends 2024.10 py312_mkl_0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py312h5eee18b_1
[conda] mkl_fft 1.3.10 py312h5eee18b_0
[conda] mkl_random 1.2.7 py312h526ad5a_0
[conda] numpy 1.26.4 py312hc5e2394_0
[conda] numpy-base 1.26.4 py312h0da6c21_0
[conda] numpydoc 1.7.0 py312h06a4308_0
[conda] torch 2.8.0a0+git48807d5 dev_0 <develop>