I have 3 models: model, model1 and aggregated_model. Aggregated_model has the weights equal to the mean of the weights of the first 2 models.
In my function I have this:
PATH = args.model
PATH1 = args.model1
PATHAGG = args.model_agg
model = VGG16(1)
model1 = VGG16(1)
aggregated_model = VGG16(1)
modelsd = model.load_state_dict(torch.load(PATH))
model1sd = model1.load_state_dict(torch.load(PATH1))
print("WEIGHTS MODEL BEFORE AGGREGATION:")
print(list(model.block_1[0].parameters())[0][0])
print("WEIGHTS MODEL1 BEFORE AGGREGATION:")
print(list(model1.block_1[0].parameters())[0][0])
init_params(aggregated_model, model, model1)
torch.save(aggregated_model.state_dict(), PATHAGG)
print("AGGREGATED_MODEL WEIGHTS:")
print(list(aggregated_model.block_1[0].parameters())[0][0])
torch.save(aggregated_model.state_dict(), PATH)
torch.save(aggregated_model.state_dict(), PATH1)
So, basically I pass to my function 3 parameters containing the model state dict of the models of the previous iteration. I initialise 3 random models in order to load the state_dict I passed. Then I perform aggregation and It works:
WEIGHTS MODEL BEFORE AGGREGATION:
tensor([[[-0.0588, -0.4677, 0.7516],
[ 0.7967, 0.6580, 0.2423],
[-0.4919, 0.8133, 0.3545]]], grad_fn=<SelectBackward0>)
WEIGHTS MODEL1 BEFORE AGGREGATION:
tensor([[[-0.1252, 0.3967, 0.8079],
[ 0.6157, -0.4270, -0.5611],
[ 0.3319, 0.7574, -0.4349]]], grad_fn=<SelectBackward0>)
AGGREGATED MODEL WEIGHTS:
tensor([[[-0.0920, -0.0355, 0.7797],
[ 0.7062, 0.1155, -0.1594],
[-0.0800, 0.7853, -0.0402]]], grad_fn=<SelectBackward0>)
As you can see, the weights of aggregated model are the mean of the first 2. Then I save this aggregated model in the state_dict of all my 3 models because I want that next iteration I have that the 3 models have the same weights. I expect this:
WEIGHTS MODEL BEFORE AGGREGATION:
tensor([[[-0.0920, -0.0355, 0.7797],
[ 0.7062, 0.1155, -0.1594],
[-0.0800, 0.7853, -0.0402]]], grad_fn=<SelectBackward0>)
WEIGHTS MODEL1 BEFORE AGGREGATION:
tensor([[[-0.0920, -0.0355, 0.7797],
[ 0.7062, 0.1155, -0.1594],
[-0.0800, 0.7853, -0.0402]]], grad_fn=<SelectBackward0>)
AGGREGATED MODEL WEIGHTS:
tensor([[[-0.0920, -0.0355, 0.7797],
[ 0.7062, 0.1155, -0.1594],
[-0.0800, 0.7853, -0.0402]]], grad_fn=<SelectBackward0>)
However, everytime I call my function I obtain always different weights:
WEIGHTS MODEL BEFORE AGGREGATION:
tensor([[[-0.6939, -0.1393, -0.4461],
[ 0.2242, 0.3700, 0.2389],
[-0.3851, -0.6902, 0.4063]]], grad_fn=<SelectBackward0>)
WEIGHTS MODEL1 BEFORE AGGREGATION:
tensor([[[-4.2956e-01, -8.7593e-02, 1.7659e-01],
[ 5.6529e-01, 6.6516e-01, 9.3472e-02],
[-7.2457e-01, 5.0922e-01, 7.7867e-07]]], grad_fn=<SelectBackward0>)
AGGREGATED MODEL WEIGHTS:
tensor([[[-0.5617, -0.1134, -0.1347],
[ 0.3947, 0.5176, 0.1662],
[-0.5549, -0.0905, 0.2031]]], grad_fn=<SelectBackward0>)
So, aggregation works, but probably I am doing something wrong in save/load routine. There is no sense… I save a model but it loads completely different weights…