I don’t understand what max_iter exactly does in the LBFGS algorithm when optimizing the parameters in a NN. Isn’t max_iter the maximum number of data points and therefore depends on the mini-batch size when training a neural network like
optimizer = torch.optim.LBFGS(model.parameters(), lr=0.1, max_iter=10) for epoch in range(epochs): for i, (images, labels) in enumerate(train_loader): ...
So how should max_iter be chosen in a Neural Network when it is trained this way? Does an optimal choice exist?