Loading optimizer dict starts training from initial LR

so i save my the model as a checkpoint using the following code

torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'train_loss': train_loss
        }, SAVE_PATH + "Regressor_epoch{}_step{}.pkl".format(epoch, steps))

once i load the optimizer using optimizer.load_state_dict()
and resume training my model starts from the initial LR which is 0.001 and not 1e-9
given below is my training code

writer = SummaryWriter()
model = model.cuda()
criterion = nn.SmoothL1Loss().cuda()

#optimizer = torch.optim.Adam(model.parameters(), lr=inf2['optimizer_state_dict']['param_groups'][0]['lr'])
# optimizer.state_dict()['param_groups']['params'] = inf2['optimizer_state_dict']['param_groups']['params']

optimizer.load_state_dict(inf2['optimizer_state_dict'])

best_val_loss = inference_dict['val_loss']
best_train_loss = inference_dict['train_loss']
    
scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[4, 8, 15, 20, 45], gamma=0.1)

num_epochs = 150
running_loss = 0
steps = 0
print_every = 35
log_every = 10
log_step = 0


for epoch in range(num_epochs):
    model.train()
    scheduler.step()
    for data_ in trainloader:
        steps += 1
        img, bbox = data_
        
        img = Variable(img.cuda())
        target = Variable(bbox.cuda())
        
        output = model(img)
        loss = criterion(output, target)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        train_loss = running_loss/steps

        writer.add_scalar('Training Loss', train_loss, steps)
        writer.add_scalar('Learning rate', optimizer.state_dict()['param_groups'][0]['lr'], epoch)

        if train_loss < best_train_loss:
            torch.save({
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'train_loss': train_loss
        }, SAVE_PATH + "Regressor_epoch{}_step{}.pkl".format(epoch, steps))
            best_train_loss = train_loss
            
        if steps % print_every == 0:
            print("Epoch: {}/{}.. ".format(epoch+1, num_epochs),
                  "Training Loss: {:.4f}.. ".format(train_loss),
                 "Learning Rate: {}".format(optimizer.state_dict()['param_groups'][0]['lr']))
writer.close()

Can you also try saving the scheduler state_dict, and then load that along with model and optimizer states?

model = nn.Conv2d(3, 8, 3)
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
sched = lr_scheduler.MultiStepLR(optimizer, milestones=[10, 20, 30], gamma=0.1, last_epoch=-1
)
>>> for i in range(25):
...      sched.step()

## current state of sched: and optimizer:
>>> optimizer
SGD (
Parameter Group 0
    dampening: 0
    initial_lr: 0.1
    lr: 0.0010000000000000002
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)
>>> lr_sched.state_dict()
{'milestones': [10, 20, 30], 'gamma': 0.1, 'base_lrs': [0.1], 'last_epoch': 24}

## saving the checkpoint:
>>> torch.save({'model_state_dict': model.state_dict(),                                                 
                'optimizer_state_dict': optimizer.state_dict(),
                'scheduler_state_dict': sched.state_dict()}, 'checkponit.pth') 

now we can load all of them:

## loading the checkpoint
>>> checkpoint = torch.load('checkponit.pth')

>>> model_2 = nn.Conv2d(3, 8, 3)                                                                      
>>> optimizer_2 = optim.SGD(model_2.parameters(), lr=0.1, momentum=0.9)
>>> sched_2 = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[10, 20, 30], gamma=0.1, last_epoch=-1)

>>> model_2..load_state_dict(checkpoint['model_state_dict'])
>>> optimizer_2.load_state_dict(checkpoint['optimizer_state_dict'])
>>> sched_2.load_state_dict(checkpoint['scheduler_state_dict'])
>>> optimizer_2
SGD (
Parameter Group 0
    dampening: 0
    initial_lr: 0.1
    lr: 0.0010000000000000002
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)
>>> sched_2.state_dict()
{'milestones': [10, 20, 30], 'gamma': 0.1, 'base_lrs': [0.1], 'last_epoch': 24}

So this shows that the 'last_epoch ’ is correctly loaded. Now, we can just try one more epoch for final verification:

>>> sched_2.step()
>>> sched_2.state_dict()
{'milestones': [10, 20, 30], 'gamma': 0.1, 'base_lrs': [0.1], 'last_epoch': 25}
>>> optimizer_2
SGD (
Parameter Group 0
    dampening: 0
    initial_lr: 0.1
    lr: 0.0010000000000000002
    momentum: 0.9
    nesterov: False
    weight_decay: 0
)

So now as you can see, the lr of the optimizer is not re-initialized.

1 Like

I tried removing the scheduler itself from the code and see if it makes any difference but it didn’t.

I hadn’t saved the scheduler when i saved the model, so anyway i can just load optimizer state dictionary and continue training?

EDIT: I had just removed the scheduler.step()
but i had not removed the following code:

scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[4, 8, 15, 20, 45], gamma=0.1)

you have to remove this too.

In future its best to save the scheduler dict as @vmirly1 suggested

I recently stumbled on this issue… I don’t know if this is “best practice”, but my solution was to subtract the current epoch / starting_epoch (which is saved along with the model and optimizer, or 0 when starting from scratch) to the milestones (if the milestones are expressed in epochs), e.g:

milestones = [epoch - starting_epoch for epoch in milestones]
scheduler = lr_scheduler.MultiStepLR(optim, milestones, gamma=0.1)

I guess the scheduler goes through all the milestones, even negative ones, and in the case that the current_epoch is larger than the milestone, the learning rate is multiplied by gamma. (I was a bit surprised by this!)

This should also work (untested), but is probably less efficient:

for _ in range(current_epoch):
    scheduler.step()

This might save a bit of space in the checkpoint file…

Note that if you have not saved the scheduler, you can still fix this problem. Just add a for-loop to add steps to the new scheduler until it reaches the final epoch of the last saved model.

1 Like