Why does LBFGS scale the initial step size?

benjww · July 18, 2022, 5:47pm

I’m currently employing the LBFGS learning algorithm without line search and noticed the initial step size may be scaled relative to the gradient:

############################################################
# compute step length
############################################################
# reset initial guess for step size
if state['n_iter'] == 1:
    t = min(1., 1. / flat_grad.abs().sum()) * lr
else:
    t = lr

Does anyone know the reasoning for this? Most implementations I’ve seen (that don’t use line search) employ a fixed step size. I’ve also looked in the appropriate chapters in [Jorge Nocedal, Stephen Wright: Numerical Optimization] and found no discussion of scaling step size in this way.

Thanks!