I am trying to implement an MLP network with the LBFGS optimizer to Solve a classification problem. I have implemented a network in the sklearn with following configuration:

But I don’t know how these hyperparameters should be configured in PyTorch. For example, ‘max_iters’ stands for number of gradient steps, not for the number of epochs for “LBFGS” optimizer and I don’t know how should I interpret that in PyTorch!

For MLP part and activation function, you can use torch.nn modules such as nn.Linear and nn.ReLU.
For L-BFGS solver, PyTorch supports it in torch.optim. You can see possible arguments here.
For validation_fraction argument, you can either create validation set using sklearn.model_selection or manually using list comprehensions or even data loaders of pytorch.

Thank you very much, but my main question is about the max_iter parameter. As I have noticed, this parameter in sklearn (when the solver is “LBFGS”) doesn’t indicate the number of epochs but the number of gradient steps. Moreover, the parameter max_fun indicates the maximum number of loss function calls. How can I set these two parameters in torch.optim?

Where max_iter is same as sklearn and max_fun is max_eval.

lr (float) – learning rate (default: 1)
max_iter (int) – maximal number of iterations per optimization step (default: 20)
max_eval (int) – maximal number of function evaluations per optimization step (default: max_iter * 1.25).
tolerance_grad (float) – termination tolerance on first order optimality (default: 1e-5).
tolerance_change (float) – termination tolerance on function value/parameter changes (default: 1e-9).
history_size (int) – update history size (default: 100).
line_search_fn (str) – either ‘strong_wolfe’ or None (default: None).