Copying weights from one net to another

rbrigden · March 30, 2017, 9:34pm

Is there a canonical method to copy weights from one network to another of identical structure?

fmassa · March 31, 2017, 1:25am

You can use load_state_dict and state_dict for that.

net1.load_state_dict(net2.state_dict())

You can also deep copy a model via copy.deepcopy.

FuriouslyCurious · March 31, 2017, 3:00am

How does deep copy / canonical copy differ from normal weights loading?

fmassa · March 31, 2017, 3:05am

it avoids you from having to do

model1 = Model()
model2 = Model()

model2.load_state_dict(model1.state_dict())

and instead you only do

model1 = Model()
model2 = copy.deepcopy(model1)

shubhamjain0594 · April 18, 2017, 6:47am

As far as I have seen the code “load_state_dict copies only parameters and buffers”.

Does deepcopy also copies only _parameters and _buffers or the hooks as well?

qq456cvb · January 26, 2018, 2:27am

deep copy will recursively copy every member of an object, so it copies everything

spnova12 · February 19, 2018, 11:46am

If using deepcopy in my experience, optimizer does not work …

Navneet_M_Kumar · March 1, 2018, 12:10pm

I want to copy a part of the weight from one network to another.
Using something like polyak averaging

Example:

weights_new = k*weights_old + (1-k)*weights_new

How can I do this?

bhushans23 · March 4, 2018, 5:56pm

Right. How should we go about it then?
deepcopying optimizer as well?

D-X-Y · March 19, 2018, 9:38am

Hi, have you found a effective way to do this thing?

Navneet_M_Kumar · March 19, 2018, 3:52pm

t = polyak_constant
target_dqn_model.conv1.weight.data = t*(dqn_model.conv1.weight.data) +
(1-t)*(target_dqn_model.conv1.weight.data)

I am doing this for each layer. I believe there must be a better method but this works for now.

Navneet_M_Kumar · March 19, 2018, 3:53pm

roee · March 19, 2018, 4:26pm

I think you need to reinitialize the optimizer using the new copied model and then you can copy the optimizer inner values from one to the other, a bit of a mess. I would probably stop at reinitializing the optimizer.

bhushans23 · March 19, 2018, 4:38pm

Yes. Here is way to do so - Does deepcopying optimizer of one model works across the model? or should I create new optimizer every time?

roee · March 20, 2018, 11:47am

Less messy than what I assumed, good to know! thx!

Brando_Miranda · March 31, 2018, 3:14am

does one know to to properly solve the polykia averaging issue mentioned? The solution mentioned doesn’t work I believe:

Navneet_M_Kumar · April 1, 2018, 9:11am

def polyak_update(polyak_factor, target_network, network):
    for target_param, param in zip(target_network.parameters(), network.parameters()):
        target_param.data.copy_(polyak_factor*param.data + target_param.data*(1.0 - polyak_factor))

Source: https://github.com/navneet-nmk/pytorch-rl/blob/master/train_ddpg.py

Krishna_Garg · August 5, 2020, 10:06pm

load_state_dict worked for me.

copy.deepcopy failed for me with run time error “Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment”