Does deepcopying optimizer of one model works across the model? or should I create new optimizer every time?

bhushans23 · March 5, 2018, 12:02am

I am trying to deep copy models

Are there any recommended methods to clone a model? and Copying weights from one net to another
recommends to use copy.deepcopy() which works, but shouldn’t I need to deep copy optimizer as well?

I am confused between following tow -

model2 = copy.deepcopy(model1)
opt2 = torch.optim.Adam(model2.parameters(), lr = 0.0001)

vs

model2 = copy.deepcopy(model1)
opt2 = copy.deepcopy(opt1)   # we are deepcopying model1 so, deep copy it's optimizer

Which approach is better? or incorrect?

Thanks.

jpeg729 · March 5, 2018, 6:39am

deepcopying the optimiser won’t work because it will either give you another optimiser for model1, or more likely simply break the references to the model parameters.

Here is what I would do…

model2 = Model() # get new instance
model2.load_state_dict(model1.state_dict()) # copy state
opt2 = torch.optim.Adam(model2.parameters(), lr = 0.0001) # get new optimiser
opt2.load_state_dict(opt1.state_dict()) # copy state

Konpat_Ta_Preechakul · July 24, 2019, 2:32am

Does this mean opt2 will be just another optimizer for model1 because it gets the param_group from opt1?

bhushans23 · August 23, 2019, 12:55am

Yes.
Opt2 will be different set of parameters copied at the checkpoint.

hoangcuong2011 · September 12, 2020, 8:59pm

Let us assume we want to copy a model and then fine-tune a model as normal after that. Let us also assume that we don’t need to resume the optimizer that was used to train the model before. We just create a totally new new optimizer and start training as in normal fine-tuning.

With that I am very curious what is wrong with deepcopy and why we need to use load_state_dict instead?

I read some discussion on pytorch.org and some well-known people (e.g. @ptrblck, @albanD, @apaszke, to name a few) suggest to use copy.deepcopy. But I heard some concern about that as in this post. Could someone elaborate more on this? Thanks!

ptrblck · September 12, 2020, 9:03pm

deepcopy should work in your use case, since you are not trying to copy the optimizer.
I personally prefer the explicit approach in creating new objects and load the state_dicts I want.
As you can see from the previous posts copying the model and optimizer might create new objects, but the parameter references in the optimizer wouldn’t be updated to the new model.

johannes-lee · September 16, 2020, 1:09am

For anyone who wants to copy a model (model1) and its optimizer (optimizer1) without knowing beforehand which type of optimizer it is, I use:

model2 = copy.deepcopy(model1)
optimizer2 = type(optimizer1)(model2.parameters())
optimizer2.load_state_dict(optimizer1.state_dict())

johannes-lee · September 17, 2020, 8:22pm

Correction:

model2 = copy.deepcopy(model1)
optimizer2 = type(optimizer1)(model2.parameters(), lr=optimizer1.defaults['lr'])
optimizer2.load_state_dict(optimizer1.state_dict())

Nikola_Andro · January 18, 2023, 8:00am

Would we do the same if we wanted to make a copy of a scheduler?

scheduler2 = type(scheduler1)(optimizer2)
scheduler2.load_state_dict(schedduler1.state_dict())