What state_dict to save for model fine-tuning

nicozhou · September 9, 2021, 1:05pm

Hi, I am training a model with synthetic data and wish to fine-tune it with my real-dataset. So, what state_dict should I save for continue training with my real-dataset?

Does saving model.state_dict is enough for fine-tuning? Or should I save optimizer.state_dict too?

ptrblck · September 9, 2021, 11:42pm

For a general fine-tuning use case, storing the model.state_dict() might be sufficient, as it’s also done when you are trying to fine-tune e.g. the torchvision models (you can’t load the optimizer.state_dict() as they are not available).
However, if you would like to “continue” the training, then you should store the optimizer.state_dict() as well (additionally, you might also want to store the learning rate scheduler’s state_dict if used).

nicozhou · September 10, 2021, 7:07am

Thanks for the answer, I am curious does loading optimizer.state_dict() will make huge difference? Because I found no different between loading and not loading optimizer.state_dict in both fine-tuning and ‘continue’ training.

ptrblck · September 10, 2021, 7:26am

If the optimizer has internal states (e.g. such as Adam) I would claim you should see a difference.

nicozhou · September 10, 2021, 7:30am

I see, maybe in my case there is no significant difference. But I still have one question in mind, is it a good practice to load model.state_dict and optimizer.state_dict to do fine-tuning on other dataset? Which is like the “continue” training do.

banton · November 28, 2022, 1:42pm

Were you able to find an answer to this question anywhere?