Confusing nature of optim.LBFGS


I’d like to use LBGFS as an alternative to scipy.optimize.fmin_l_bfgs_b. I’m currently using optimizer = torch.optim.LBFGS([U], lr=0.1, max_iter=50, line_search_fn='strong_wolfe') as my optimizer. However, I don’t understand why we would need a parameter of learning rate when using line-search mechanism. I observe stark difference in results when using scipy’s function to torch’s. My objective is a polyconvex as explained in this paper( Please also see the implementation in scipy in Listing 1 that I’m porting in pytorch. Reason to port this is that when I would use this on different object categories, I don’t want to compute analytical gradients. Thanks for the tips/suggestions