When you estimate using LBFGS, you have to wrap the optimization steps in a closure, so each pass through has something like the following:
def closure(): optimizer.zero_grad() prediction = model(data) loss = criterion(prediction, target) loss.backward() return loss optimizer.step(closure)
Having the loss within the closure makes it difficult to stash what the current loss is each time (i.e. can’t just do
losses += [loss.detach().item()]).
One option is to re-evaluate the model and criterion each time outside of the closure (with
torch.no_grad()), but this is a waste of compute.
Is there a better way?