What is the purpose of the `closure` argument for `optimizer.step`?

Considering the docs I realized that every optimizer.step method has an (optional) closure argument (for LBFGS it’s even required). The info text is:

A closure that reevaluates the model and returns the loss.

For me it is not completely clear which steps should be taken in this closure function. I reviewed some examples (for example here and there) and it looked like that within the closure we should:

  • Zero the gradients
  • Compute the loss
  • Backprop on the loss
  • Return the loss

So my questions are:

  1. Is the above list of (minimum required) steps that should be taken within the closure function correct and complete?
  2. What is the purpose of returning the loss? What does the optimizer do with it? Should we return the loss that we backproped on or create a new one (the two examples handle this situation differently; maybe it doesn’t matter)?
  3. For an optimizer for which the closure argument is optional, is closure(); optimizer.step() similar to optimizer.step(closure)?

Thanks for your input!

7 Likes

Have you known it? I have the similar question.

1 Like

I’m not completely sure about the internals but I’ve used closure that way successfully. So to comment on my own questions:

  1. I performed those steps in the closure and it worked.
  2. Some optimizers (e.g. LBFGS) will terminate depending on the value of the loss, so that’s why it needs access to it and hence it should be returned.
  3. I suppose those two versions are similar.

Some optimizers (again LBFGS) need to evaluate the model multiple times hence the closure definition makes that possible.

Anyway I would be glad if someone with more in-depth knowledge could comment on these questions and clarify the situation.