Two models with same weights, different results

Hello,

I have two models that are supposed to be copies of each other, but perform differently. Look at this:

>>> repr(model1) == repr(model2) #have the same structure
True
>>> for idx, (p1, p2) in enumerate(zip(model1.named_parameters(), model2.named_parameters())):
    if not p1[0] == p2[0]:
        print('different parameter order for idx {}'.format(idx))
    if not torch.equal(p1[1].data, p2[1].data):
        print('idx {} not equal'.format(idx))
#nothing is printed, means they are the same
>>> evaluateModel(model1, test_loader, fastEvaluation=False)
0.8836
>>> evaluateModel(model2, test_loader, fastEvaluation=False)
0.8735

What could be the problem? They are instances of the same class, created with the same parameters. Have the same structure, and the only thing I do is changing the weights in a certain way for model1, then inverting these changes. Point being, the two models have the same weight, I expect them to perform identically.

Why does this not happen? What could be the problem? What other fields should I check to make sure that the models are the same?

P.S. note that evaluateModel automatically calls eval() on the input model, so this can’t be a train vs eval mode difference.

1 Like

I found the problem. It turns out that if you have batch normalization layers, you need to keep track of the running mean and the running variance. These values don’t show up in the model parameter list (they are not parameters) but are important at test time.

They are present in the model state dict, though, which is how I found out they were different for the two models.

1 Like

Yeah, you should use
model.eval()
when you eval the model performance.
The eval mode has to confirm. It makes difference when there are dropout, batch normalization.

Hi,

I think you have misunderstood. I was already using model.eval() for both models; the issue was that, while the models parameters were indeed the same, their running mean and running var values of their batch normalization layers were not the same.

So when you have batch normalization layers, to determine whether two models are the same you can’t just check the parameters of the model, but also the running mean and running var of the batch normalization layers.

Or in general just check the state_dict that contains everything :slight_smile:

4 Likes

I got it. Thanks so much. I will write less bugs.

I also recently faced the same problem. As suggested by @antspy, the trick is to compare their state_dict(), and I found that all the running_mean and running_var params differ, hence causing the mismatch. Just in case, anybody needs the code, here it is -

def compare_models(model_1, model_2):
    models_differ = 0
    for key_item_1, key_item_2 in zip(model_1.state_dict().items(), model_2.state_dict().items()):
        if torch.equal(key_item_1[1], key_item_2[1]):
            pass
        else:
            models_differ += 1
            if (key_item_1[0] == key_item_2[0]):
                print('Mismtach found at', key_item_1[0])
            else:
                raise Exception
    if models_differ == 0:
        print('Models match perfectly! :)')
3 Likes

Just a little addition to check if same device

def compare_models(model_1, model_2):
    models_differ = 0
    for key_item_1, key_item_2 in zip(model_1.state_dict().items(), model_2.state_dict().items()):
        if key_item_1[1].device == key_item_2[1].device  and torch.equal(key_item_1[1], key_item_2[1]):
            pass
        else:
            models_differ += 1
            if (key_item_1[0] == key_item_2[0]):
                _device = f'device {key_item_1[1].device}, {key_item_2[1].device}' if key_item_1[1].device != key_item_2[1].device else ''
                print(f'Mismtach {_device} found at', key_item_1[0])
            else:
                raise Exception
    if models_differ == 0:
        print('Models match perfectly! :)')

1 Like