I am not sure how model and optimizers work together in pytorch.
Here is the thing,
I save my model(full model, not the state_dict of the model) and i save the optimizer.state_dict
torch.save({
'model': model, # it saves the whole model
'optimizer_state_dict': optimizer.state_dict(),
'lr_scheduler_state_dict': lr_scheduler.state_dict(),
}, save_path)
Then i load my model and freeze some layers and i define the optimizer again and then load the optimizers state_dict…
ckpt = torch.load(save_path)
model = ckpt['model']
for name, param in model.named_parameters():
if ('layer4' in name) or ('fc' in name):
param.requires_grad = True
else:
param.requires_grad = False
optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr = lr)
optimizer.load_state_dict(ckpt['optimizer_state_dict'])
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.1)
exp_lr_scheduler.load_state_dict(ckpt['lr_scheduler'])
It throws an ugly error- ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group
Can you pls help me get it corrected ?
Any why do we even need to save the state_dict of the optimizers and the scheduler ?
I get it now, that the error is because of the mismatch of the parameters
but i can not remove the filters, since i am freezing some layers of the model.
What if after freezing the layers, i define a new optimizer ?
Would it be any different from loading the state_dict of the previously saved optimizer ?
As long as you set requires_grad=False before forward and use optimizer.zero_grad(), it won’t be updated.
Adam optimizer maintains learning rate adaptively, so it has internal state changing with training. It’s different between new and trained. Generally, if you want to continue training, load from state_dict.
Hi @liyz15
Thanks for the explaination once again !
But what do you suggest me to do in my case where, i have to freeze layers after some epochs of training ?
Do you suggest me to define a new optimizer after freezing the layers of the model ?
What’s the purpose of freezing layers? If you are trying to finetune on different dataset, a new one is preferred. If it’s some training techniques to freeze some layer during training, then continue with the same one.
Yes freezing layers is a technique of training a model. So i train the model for, lets say 5 epochs on the last 3 layers, then train further for 3 epochs for only the last 2 layers(freezing the thirds last layer) and so on…
And saving the model.state_dict() does not save the requires_grad attribute of the parameters of the model whearas saving the entire model does save it. Saving the entire model works as long as u are not changing the architecture of the model itself.
So, given that i have to freeze layers after a few epochs, do i have any option other than defining a new optimizer ?