What is the purpose of the `closure` argument for `optimizer.step`?

Dominik · November 30, 2018, 5:15pm

Considering the docs I realized that every optimizer.step method has an (optional) closure argument (for LBFGS it’s even required). The info text is:

A closure that reevaluates the model and returns the loss.

For me it is not completely clear which steps should be taken in this closure function. I reviewed some examples (for example here and there) and it looked like that within the closure we should:

Zero the gradients
Compute the loss
Backprop on the loss
Return the loss

So my questions are:

Is the above list of (minimum required) steps that should be taken within the closure function correct and complete?
What is the purpose of returning the loss? What does the optimizer do with it? Should we return the loss that we backproped on or create a new one (the two examples handle this situation differently; maybe it doesn’t matter)?
For an optimizer for which the closure argument is optional, is closure(); optimizer.step() similar to optimizer.step(closure)?

Thanks for your input!

JTiC · August 8, 2019, 1:36am

Have you known it? I have the similar question.

Dominik · March 11, 2020, 9:23am

I’m not completely sure about the internals but I’ve used closure that way successfully. So to comment on my own questions:

I performed those steps in the closure and it worked.
Some optimizers (e.g. LBFGS) will terminate depending on the value of the loss, so that’s why it needs access to it and hence it should be returned.
I suppose those two versions are similar.

Some optimizers (again LBFGS) need to evaluate the model multiple times hence the closure definition makes that possible.

Anyway I would be glad if someone with more in-depth knowledge could comment on these questions and clarify the situation.

janeyx99 · December 28, 2023, 12:13pm

This is definitely an ultra late reply, but I have just also come across this question myself. After looking in the code a bit, I am hoping to add a tad more clarity, but a researcher would know more!

The closure is mostly, if not entirely, only needed for LBFGS. I have not seen another use case for it with PyTorch where the typical way without the closure was not sufficient. (By the typical way, I mean the training loop of zero_grad, forward to get loss, backprop, optimizer.step.)

In LBFGS, the closure is used to return the loss and update the gradients, so those two things would be the minimum requirement. In a typical model, the four steps you mentioned would suffice, but you could imagine cruder (though less useful) ways to update the grads and return a loss.
As you’ve concluded, LBFGS repeatedly recalculates the loss and uses that and the updated grads to evaluate a direction until certain conditions are met. Internally, as of today, LBFGS immediately converts the losses to floats, so it doesn’t matter for LBFGS internals whether you return the literal loss tensor instance or a new one. Its step will return the original loss unprocessed, so if you intend to use the loss returned by step, then it would matter.
Yes, closure(); optimizer.step() is similar to optimizer.step(closure), though it ultimately depends on how you define your closure. In most cases I’ve seen, this has been true, but I am not sure why people would use closures for anything other than LBFGS. One difference I did observe is that if you do pass in a closure, step will return the loss returned by the closure. Maybe this is useful, but it does not seem commonly used.