When you estimate using LBFGS, you have to wrap the optimization steps in a closure, so each pass through has something like the following:
def closure():
optimizer.zero_grad()
prediction = model(data)
loss = criterion(prediction, target)
loss.backward()
return loss
optimizer.step(closure)
Having the loss within the closure makes it difficult to stash what the current loss is each time (i.e. can’t just do losses += [loss.detach().item()]
).
One option is to re-evaluate the model and criterion each time outside of the closure (with torch.no_grad()
), but this is a waste of compute.
Is there a better way?