Save/Load Weights Pytorch change everytime

CasellaJr · August 22, 2022, 11:27am

I have 3 models: model, model1 and aggregated_model. Aggregated_model has the weights equal to the mean of the weights of the first 2 models.
In my function I have this:

PATH = args.model
PATH1 = args.model1
PATHAGG = args.model_agg

model = VGG16(1)
model1 = VGG16(1)
aggregated_model = VGG16(1)

modelsd = model.load_state_dict(torch.load(PATH))
model1sd = model1.load_state_dict(torch.load(PATH1))
print("WEIGHTS MODEL BEFORE AGGREGATION:")
print(list(model.block_1[0].parameters())[0][0])
print("WEIGHTS MODEL1 BEFORE AGGREGATION:")
print(list(model1.block_1[0].parameters())[0][0])

init_params(aggregated_model, model, model1)
torch.save(aggregated_model.state_dict(), PATHAGG)
print("AGGREGATED_MODEL WEIGHTS:")
print(list(aggregated_model.block_1[0].parameters())[0][0])
torch.save(aggregated_model.state_dict(), PATH)
torch.save(aggregated_model.state_dict(), PATH1)

So, basically I pass to my function 3 parameters containing the model state dict of the models of the previous iteration. I initialise 3 random models in order to load the state_dict I passed. Then I perform aggregation and It works:

WEIGHTS MODEL BEFORE AGGREGATION:
tensor([[[-0.0588, -0.4677,  0.7516],
         [ 0.7967,  0.6580,  0.2423],
         [-0.4919,  0.8133,  0.3545]]], grad_fn=<SelectBackward0>)
WEIGHTS MODEL1 BEFORE AGGREGATION:
tensor([[[-0.1252,  0.3967,  0.8079],
         [ 0.6157, -0.4270, -0.5611],
         [ 0.3319,  0.7574, -0.4349]]], grad_fn=<SelectBackward0>)
AGGREGATED MODEL WEIGHTS:
tensor([[[-0.0920, -0.0355,  0.7797],
         [ 0.7062,  0.1155, -0.1594],
         [-0.0800,  0.7853, -0.0402]]], grad_fn=<SelectBackward0>)

As you can see, the weights of aggregated model are the mean of the first 2. Then I save this aggregated model in the state_dict of all my 3 models because I want that next iteration I have that the 3 models have the same weights. I expect this:

WEIGHTS MODEL BEFORE AGGREGATION:
tensor([[[-0.0920, -0.0355,  0.7797],
         [ 0.7062,  0.1155, -0.1594],
         [-0.0800,  0.7853, -0.0402]]], grad_fn=<SelectBackward0>)
WEIGHTS MODEL1 BEFORE AGGREGATION:
tensor([[[-0.0920, -0.0355,  0.7797],
         [ 0.7062,  0.1155, -0.1594],
         [-0.0800,  0.7853, -0.0402]]], grad_fn=<SelectBackward0>)
AGGREGATED MODEL WEIGHTS:
tensor([[[-0.0920, -0.0355,  0.7797],
         [ 0.7062,  0.1155, -0.1594],
         [-0.0800,  0.7853, -0.0402]]], grad_fn=<SelectBackward0>)

However, everytime I call my function I obtain always different weights:

WEIGHTS MODEL BEFORE AGGREGATION:
tensor([[[-0.6939, -0.1393, -0.4461],
         [ 0.2242,  0.3700,  0.2389],
         [-0.3851, -0.6902,  0.4063]]], grad_fn=<SelectBackward0>)
WEIGHTS MODEL1 BEFORE AGGREGATION:
tensor([[[-4.2956e-01, -8.7593e-02,  1.7659e-01],
         [ 5.6529e-01,  6.6516e-01,  9.3472e-02],
         [-7.2457e-01,  5.0922e-01,  7.7867e-07]]], grad_fn=<SelectBackward0>)
AGGREGATED MODEL WEIGHTS:
tensor([[[-0.5617, -0.1134, -0.1347],
         [ 0.3947,  0.5176,  0.1662],
         [-0.5549, -0.0905,  0.2031]]], grad_fn=<SelectBackward0>)

So, aggregation works, but probably I am doing something wrong in save/load routine. There is no sense… I save a model but it loads completely different weights…

ptrblck · August 23, 2022, 4:08am

Could you post a minimal, executable code snippet reproducing this issue, so that we could take a look at it, please?

CasellaJr · August 23, 2022, 2:27pm

I solved. Basically, in my imports, I was importing a module that saved the models, resulting in an overwriting of the models. It was my fault.