Loaded network has different shape but works anyway

mario98 · June 8, 2018, 12:19pm

I trained a model with among others had the following layer:
final_layer.append(nn.Conv2d(64, 1, kernel_size=1))
and then saved it to a file with state_dict and torch.save.

Then, when I wanted to load that model using load_state_dict, but by accident the same layer was setup as follows:
final_layer.append(nn.Conv2d(64, 32, kernel_size=1))

Nevertheless, the model was loaded without error. It seemed that the weights were just duplicated 32 times, but I have not verified. So my question is how this is consistent with API documentation. I have not found a statement that says load_state_dict would somehow fix shape inconsistencies automatically. If this was true, then you have a huge documentation vs reality mismatch here. This would actually qualify as a gitHub bug report, but I first wanted to ask if I missed anything before filing a bug report. You might argue “what’s wrong with this. It’s a good thing that pytorch corrects inconsistencies automatically” but I would say this is not good if you do serious research because then the model might look different than you think it looks and your conclusions can become invalid. At least one should know what kind of magic correction happens inside each function.

P.S. pytorch release 0.4

ptrblck · June 8, 2018, 12:32pm

Can reproduce this issue with:

path = './test_model.pth'

model = nn.Conv2d(64, 1, 3, 1, 1)
torch.save(model.state_dict(), path)

model = nn.Conv2d(64, 32, 3, 1, 1)
model.load_state_dict(torch.load(path))

for w in model.weight[:, :, 0, 0]:
    print(w)

Seems like an unwanted behavior to me.

mario98 · June 8, 2018, 12:46pm

Thank you for the fast answer. It is now a bug report on gitHub: https://github.com/pytorch/pytorch/issues/8282