Implement a MLP classifier network with hyper-parameters obtained from sklearn

Pronerd · July 16, 2020, 10:18am

I am trying to implement an MLP network with the LBFGS optimizer to Solve a classification problem. I have implemented a network in the sklearn with following configuration:

MLPClassifier(activation='relu', alpha=1e-5,
             hidden_layer_sizes=(40,40,40), 
             max_fun=15000, max_iter=1500,
             solver='lbfgs', tol=0.0001,
             validation_fraction=0.1, verbose=False, warm_start=False)

But I don’t know how these hyperparameters should be configured in PyTorch. For example, ‘max_iters’ stands for number of gradient steps, not for the number of epochs for “LBFGS” optimizer and I don’t know how should I interpret that in PyTorch!

Nikronic · July 16, 2020, 12:30pm

Hi,

For MLP part and activation function, you can use torch.nn modules such as nn.Linear and nn.ReLU.
For L-BFGS solver, PyTorch supports it in torch.optim. You can see possible arguments here.
For validation_fraction argument, you can either create validation set using sklearn.model_selection or manually using list comprehensions or even data loaders of pytorch.

Bests

Pronerd · July 17, 2020, 7:38am

Thank you very much, but my main question is about the max_iter parameter. As I have noticed, this parameter in sklearn (when the solver is “LBFGS”) doesn’t indicate the number of epochs but the number of gradient steps. Moreover, the parameter max_fun indicates the maximum number of loss function calls. How can I set these two parameters in torch.optim?

Nikronic · July 17, 2020, 8:29am

If you visited the link I provided you could see the definition of optim.LBFGS which is:

torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None)

Where max_iter is same as sklearn and max_fun is max_eval.

lr (float) – learning rate (default: 1)
max_iter (int) – maximal number of iterations per optimization step (default: 20)
max_eval (int) – maximal number of function evaluations per optimization step (default: max_iter * 1.25).
tolerance_grad (float) – termination tolerance on first order optimality (default: 1e-5).
tolerance_change (float) – termination tolerance on function value/parameter changes (default: 1e-9).
history_size (int) – update history size (default: 100).
line_search_fn (str) – either ‘strong_wolfe’ or None (default: None).