Copy weights only from a network's parameters

erilyth · August 6, 2017, 10:44am

Given two sets of parameters, I want to copy only the weights from one set of parameters to the other. Is it possible to do this?

fmassa · August 7, 2017, 10:23am

I think you can do something like

params1 = model1.named_parameters()
params2 = model2.named_parameters()

dict_params2 = dict(params2)

for name1, param1 in params1:
    if name1 in dict_params2:
        dict_params2[name1].data.copy_(param1.data)

chenchr · December 1, 2017, 5:39am

@fmassa
Hello, can this achieved by load_static_dict? If net1() is build on the base of net2() and some other layer is added to net1(), and I want to finetune net1() using net2()'s weight, can I directly use load_static_dict? Thanks!

dEathEater · March 26, 2018, 2:56am

@chenchr I think it works. One can try model.fc4.load_state_dict(model.fc3.state_dict()) to updatefc4 layer’s parameters using the fc3 layer.

@fmassa, However not sure if clone() , state_dict() or deepcopy would be a better choice. It would be great if you can elaborate upon the differences!

aksg87 · April 28, 2020, 6:24am

Is there a better way to copy layer parameters from one model to another in 2020 (when trying to transfer a trained encoder or something else)?

I created this helper function per the discussion above but it doesn’t seem to be working as expected!

def copyParams(module_src, module_dest):
    params_src = module_src.named_parameters()
    params_dest = module_dest.named_parameters()

    dict_dest = dict(params_dest)

    for name, param in params_src:
        if name in dict_dest:
            dict_dest[name].data.copy_(param.data)

jose · May 1, 2020, 2:15pm

Any news on this? I am also moving forward implementing this function. Basically, I want to do some operations that will accumulate the gradient information in the form of a delta_params. Then I want to apply it to the original params, and replace the params in the model with original_params + delta_params.

aksg87 · May 2, 2020, 9:31am

I believe that code actually does copy parameters as expected. I had another issue which is why I was getting strange results (I didn’t copy the original model’s decoder).