Is there a canonical method to copy weights from one network to another of identical structure?
You can use load_state_dict
and state_dict
for that.
net1.load_state_dict(net2.state_dict())
You can also deep copy a model via copy.deepcopy
.
How does deep copy / canonical copy differ from normal weights loading?
it avoids you from having to do
model1 = Model()
model2 = Model()
model2.load_state_dict(model1.state_dict())
and instead you only do
model1 = Model()
model2 = copy.deepcopy(model1)
As far as I have seen the code “load_state_dict copies only parameters and buffers”.
Does deepcopy also copies only _parameters and _buffers or the hooks as well?
deep copy will recursively copy every member of an object, so it copies everything
If using deepcopy in my experience, optimizer does not work …
I want to copy a part of the weight from one network to another.
Using something like polyak averaging
Example:
weights_new = k*weights_old + (1-k)*weights_new
How can I do this?
Right. How should we go about it then?
deepcopying optimizer as well?
Hi, have you found a effective way to do this thing?
t = polyak_constant
target_dqn_model.conv1.weight.data = t*(dqn_model.conv1.weight.data) +
(1-t)*(target_dqn_model.conv1.weight.data)
I am doing this for each layer. I believe there must be a better method but this works for now.
I think you need to reinitialize the optimizer using the new copied model and then you can copy the optimizer inner values from one to the other, a bit of a mess. I would probably stop at reinitializing the optimizer.
Yes. Here is way to do so - Does deepcopying optimizer of one model works across the model? or should I create new optimizer every time?
Less messy than what I assumed, good to know! thx!
does one know to to properly solve the polykia averaging issue mentioned? The solution mentioned doesn’t work I believe:
def polyak_update(polyak_factor, target_network, network):
for target_param, param in zip(target_network.parameters(), network.parameters()):
target_param.data.copy_(polyak_factor*param.data + target_param.data*(1.0 - polyak_factor))
Source: https://github.com/navneet-nmk/pytorch-rl/blob/master/train_ddpg.py
load_state_dict worked for me.
copy.deepcopy failed for me with run time error “Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment”