Sorry if it turned the question again. But I can not understand if it is a behaviour linked to the code or to the adaptation of the network.
I have two models and I have to transfer part of their weight to a new model.
For example, let’s say that from model A I want to take only constitutionals layers, while from model B I only take FC.
Well, I apply the weight update like this:
model = model_in_code_from_autograd().cuda()
pretrain_model = torch.load(“path/…/model.pt”).cuda()
model_dict = model.state_dict()
pretrained_dict = pretrain_model.state_dict()
pretrained_dict = {k: v for k, v in pretrained_dict.items() if k in model_dict}
model_dict.update(pretrained_dict)
model.load_state_dict(model_dict)
Now to understand if the following code works, I do the following thing:
A. I read all the layers of the empty model where I have to transfer the information, and I calculate the average.
B. update from the model driven to the empty one, and control the average.
C. I do the same thing after updating the model.
The average of the empty model after the transfer of the weights is more or less similar to that of the previous ones. Then, the weights have been modified!
… Mean layers pretrain model is : 0.117181614
… Mean layers new-model is : 0.16264322
… Mean layer new-model after update is: 0.11407055
Here the code:
# Control mean of state parameters of my pretrain-model
params_model = model_v_fcnn.named_parameters()
list_mean_layer_a = []
for name_p, param_p in params_model:
mean = torch.mean(model_v_fcnn.state_dict()[name_p]).data.cpu().numpy()
list_mean_layer_a.append(mean)
print("\n ... Mean layers pretrain model is : "+str(np.mean(list_mean_layer_a)))
# Control mean of state parameters of my new-model
params_model = model.named_parameters()
list_mean_layer_b = []
for name_p, param_p in params_model:
mean = torch.mean(model.state_dict()[name_p]).data.cpu().numpy()
list_mean_layer_b.append(mean)
print("\n ... Mean layers new-model is : "+str(np.mean(list_mean_layer_b)))
# Update the first part of the model
model_dict = model.state_dict()
pretrained_dict = model_v_fcnn.state_dict()
# filter the model within a specific key
filter_model = {}
for k, v in pretrained_dict.items():
if k.split(".")[0][0] != "r":
filter_model[k] = v
# 2. overwrite entries in the existing state dict
model_dict.update(filter_model)
# 3. load the new state dict
model.load_state_dict(model_dict)
# Re-Update the second part of the model
model_dict = model.state_dict()
pretrained_dict = model_rv_fcnn.state_dict()
# filter the model within a specific key
filter_model = {}
for k, v in pretrained_dict.items():
if k.split(".")[0][0] == "r":
filter_model[k] = v
# 2. overwrite entries in the existing state dict
model_dict.update(filter_model)
# 3. load the new state dict
model.load_state_dict(model_dict)
# Checking if the previus new-model mean is different
params_model = model.named_parameters()
list_mean_layer_c = []
for name_p, param_p in params_model:
mean = torch.mean(model.state_dict()[name_p]).data.cpu().numpy()
list_mean_layer_c.append(mean)
print("\n ... Mean layer new-model after update is: "+str(np.mean(list_mean_layer_c)))
However, when I run the model again it seems that starts from scratch (as if the weights were random then):
A. It turns out to be a normal behavior
B. Do weights have to be readjusted?
C. Do I have any errors in the way I update the weights?
Best,
Nico