Copying weights from one net to another

Is there a canonical method to copy weights from one network to another of identical structure?

7 Likes

You can use load_state_dict and state_dict for that.

net1.load_state_dict(net2.state_dict())

You can also deep copy a model via copy.deepcopy.

18 Likes

How does deep copy / canonical copy differ from normal weights loading?

it avoids you from having to do

model1 = Model()
model2 = Model()

model2.load_state_dict(model1.state_dict())

and instead you only do

model1 = Model()
model2 = copy.deepcopy(model1)
6 Likes

As far as I have seen the code “load_state_dict copies only parameters and buffers”.

Does deepcopy also copies only _parameters and _buffers or the hooks as well?

deep copy will recursively copy every member of an object, so it copies everything

2 Likes

If using deepcopy in my experience, optimizer does not work …

2 Likes

I want to copy a part of the weight from one network to another.
Using something like polyak averaging

Example:

weights_new = k*weights_old + (1-k)*weights_new

How can I do this?

3 Likes

Right. How should we go about it then?
deepcopying optimizer as well?

Hi, have you found a effective way to do this thing?

t = polyak_constant
target_dqn_model.conv1.weight.data = t*(dqn_model.conv1.weight.data) +
(1-t)*(target_dqn_model.conv1.weight.data)

I am doing this for each layer. I believe there must be a better method but this works for now.

1 Like

I think you need to reinitialize the optimizer using the new copied model and then you can copy the optimizer inner values from one to the other, a bit of a mess. I would probably stop at reinitializing the optimizer.

Yes. Here is way to do so - Does deepcopying optimizer of one model works across the model? or should I create new optimizer every time?

Less messy than what I assumed, good to know! thx!

does one know to to properly solve the polykia averaging issue mentioned? The solution mentioned doesn’t work I believe:

def polyak_update(polyak_factor, target_network, network):
    for target_param, param in zip(target_network.parameters(), network.parameters()):
        target_param.data.copy_(polyak_factor*param.data + target_param.data*(1.0 - polyak_factor))

Source: https://github.com/navneet-nmk/pytorch-rl/blob/master/train_ddpg.py

6 Likes

load_state_dict worked for me.

copy.deepcopy failed for me with run time error “Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment”