Why does LBFGS scale the initial step size?

I’m currently employing the LBFGS learning algorithm without line search and noticed the initial step size may be scaled relative to the gradient:

############################################################
# compute step length
############################################################
# reset initial guess for step size
if state['n_iter'] == 1:
    t = min(1., 1. / flat_grad.abs().sum()) * lr
else:
    t = lr

Does anyone know the reasoning for this? Most implementations I’ve seen (that don’t use line search) employ a fixed step size. I’ve also looked in the appropriate chapters in [Jorge Nocedal, Stephen Wright: Numerical Optimization] and found no discussion of scaling step size in this way.

Thanks!