Switch from LBFGS to ADAM optimizer in the middle of training

peymanp · September 26, 2020, 1:11am

#I want to switch the optimizer in the middle of the training 
#from LBFGS to Adam.
#Below is the code that I wrote. 

optimizer = optim.LBFGS(model.parameters(), lr=0.003)
Use_Adam_optim_FirstTime=True
Use_LBFGS_optim=True

for epoch in range(30000):
    loss_SUM = 0
    for i, (x, t) in enumerate(GridLoader):
        x = x.to(device)
        t = t.to(device)

        if Use_LBFGS_optim:
          def closure():
            optimizer.zero_grad()
            lg, lb, li = problem_formulation(x, t, x_Array,t_Array,bndry,pi)
            loss_total=lg+ lb+ li
            loss_total.backward(retain_graph=True)
            return loss_total
          loss_out=optimizer.step(closure)
          loss_SUM+=loss_out.item()
        
        elif Use_Adam_optim_FirstTime:
          Use_Adam_optim_FirstTime=False
          optimizerAdam = optim.Adam(model.parameters(), lr=0.0003)
          model.load_state_dict(checkpoint['model'])
          optimizerAdam.zero_grad()
          lg, lb, li = problem_formulation(x, t, x_Array,t_Array,bndry,pi)
          lg.backward()
          lb.backward()
          li.backward()
          optimizerAdam.step()
          loss_SUM += lg.item()+lb.item()+li.item()
              
        else:
          optimizerAdam.zero_grad()
          lg, lb, li = problem_formulation(x, t, x_Array,t_Array,bndry,pi)
          lg.backward()
          lb.backward()
          li.backward()
          optimizerAdam.step()
          loss_SUM += lg.item()+lb.item()+li.item()
            
    if loss_SUM<.3 and use_LBFGS_optim == True:
      Use_LBFGS_optim=False
      checkpoint = {'model': model.state_dict(),
                    'optimizer': optimizer.state_dict()}

#My questions are:
#Regading the closure function: 
#(1) Is there a way that I can return more than one variable from the closure function?
#(2) Is there a way that I can make three backwards in the closure function instead of only one?
#(3) Why do I need to set retain_graph to True in loss_total.backward(retain_graph=True) of the closure function
#(4) Sometimes, the loss_SUM for the closure function approcheas to 1e+29. Is there a way to avoid this problem?
#-----------------------------
#Regading the switch from LBFGS to Adam:
#(5) When loss_SUM<0.3 and we go to the "elif" part, the loss reduces however after one epoch,
#loss_SUM dramatically increases (e.g., 20 times)
#What is the correct way of switching the optimizer from LBFGS to Adam
#In general, what are the problems of the above code and how I can impprove it.