Is it possible to change Optimizer from Adam to SGD for continue training

Wei_Wong · May 20, 2020, 1:44am

Save state_dict of model and Adam, and pause training. Load state_dict of model and change optimizer to SGD.

Nathan_Wood · May 20, 2020, 1:54am

SGD does not keep track of extra variables relating to weights (unless you’re using momentum). This means you can simply create a new SGD optimizer.

torch.save({'model': model.state_dict(), 'optim': optim.state_dict()}, '...')

To switch to SGD, use:

state_dict = torch.load('...')
model.load_state_dict(state_dict['model'])
optim = torch.optim.SGD(model.paramters(), new_learning_rate)

Wei_Wong · May 20, 2020, 1:56am

Thank you!!

Tuna · May 20, 2020, 2:54am

Your idea is using Adam to fast init training and turn to SGD at the end. I admit that is good idea and there is a paper called “Adaptive Gradient Methods with Dynamic Bound of Learning Rate. In Proc. of ICLR 2019.” has same idea.

As described in the paper, AdaBound is an optimizer that behaves like Adam at the beginning of training, and gradually transforms to SGD at the end