I want to copy a part of the weight from one network to another.
Using something like polyak averaging
Example:
weights_new = k*weights_old + (1-k)*weights_new
This is required to implement DDPG.
How can I do this?
I want to copy a part of the weight from one network to another.
Using something like polyak averaging
Example:
weights_new = k*weights_old + (1-k)*weights_new
This is required to implement DDPG.
How can I do this?
Something like this should do
# per layer and per weight param
other_model.layer.weight.data = k * model.layer.weight.data + (1-k) * other_model.layer.weight.data
your solution is missing a for loop, no? How do you actually do this with a for loop?
Error message:
>>> net
Sequential(
(0): Linear(in_features=2, out_features=2)
(1): Linear(in_features=2, out_features=2)
)
>>> net.layer
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/brandomiranda/miniconda3/envs/pytorch_overparam/lib/python3.6/site-packages/torch/nn/modules/module.py", line 366, in __getattr__
type(self).__name__, name))
AttributeError: 'Sequential' object has no attribute 'layer'
real solution:
beta = 0.5 #The interpolation parameter
params1 = model1.named_parameters()
params2 = model2.named_parameters()
dict_params2 = dict(params2)
for name1, param1 in params1:
if name1 in dict_params2:
dict_params2[name1].data.copy_(beta*param1.data + (1-beta)*dict_params2[name1].data)
model.load_state_dict(dict_params2)
named_paramters() doesn’t works well with my code. I got some “missing keys” problem when “load_state_dict”.
state_dict() is the solution.
beta = 0.5 #The interpolation parameter
params1 = model1.state_dict()
params2 = model2.state_dict()
dict_params2 = dict(params2)
for name1, param1 in params1:
if name1 in dict_params2:
dict_params2[name1].data.copy_(beta*param1.data + (1-beta)*dict_params2[name1].data)
model.load_state_dict(dict_params2)
Tried to apply this to load pretrained weights from resnet18 and it failed on loading batchnorm.running_mean