Is it possible to change Optimizer from Adam to SGD for continue training

Save state_dict of model and Adam, and pause training. Load state_dict of model and change optimizer to SGD.

SGD does not keep track of extra variables relating to weights (unless you’re using momentum). This means you can simply create a new SGD optimizer.

torch.save({'model': model.state_dict(), 'optim': optim.state_dict()}, '...')

To switch to SGD, use:

state_dict = torch.load('...')
model.load_state_dict(state_dict['model'])
optim = torch.optim.SGD(model.paramters(), new_learning_rate)
2 Likes

Thank you!! :smiling_face_with_three_hearts:

1 Like

Your idea is using Adam to fast init training and turn to SGD at the end. I admit that is good idea and there is a paper called “Adaptive Gradient Methods with Dynamic Bound of Learning Rate. In Proc. of ICLR 2019.” has same idea.

As described in the paper, AdaBound is an optimizer that behaves like Adam at the beginning of training, and gradually transforms to SGD at the end

Please refer to this code: GitHub - Luolc/AdaBound: An optimizer that trains as fast as Adam and as good as SGD.