deepcopying the optimiser won’t work because it will either give you another optimiser for model1, or more likely simply break the references to the model parameters.
Here is what I would do…
model2 = Model() # get new instance
model2.load_state_dict(model1.state_dict()) # copy state
opt2 = torch.optim.Adam(model2.parameters(), lr = 0.0001) # get new optimiser
opt2.load_state_dict(opt1.state_dict()) # copy state
Let us assume we want to copy a model and then fine-tune a model as normal after that. Let us also assume that we don’t need to resume the optimizer that was used to train the model before. We just create a totally new new optimizer and start training as in normal fine-tuning.
With that I am very curious what is wrong with deepcopy and why we need to use load_state_dict instead?
I read some discussion on pytorch.org and some well-known people (e.g. @ptrblck, @albanD, @apaszke, to name a few) suggest to use copy.deepcopy. But I heard some concern about that as in this post. Could someone elaborate more on this? Thanks!
deepcopy should work in your use case, since you are not trying to copy the optimizer.
I personally prefer the explicit approach in creating new objects and load the state_dicts I want.
As you can see from the previous posts copying the model and optimizer might create new objects, but the parameter references in the optimizer wouldn’t be updated to the new model.