Size mismatch after loading saved model. Please explain!

zswartz · July 17, 2018, 10:35pm

Hey guys, I am trying to load a trained CNN classifier that I saved so I can modify the linear layers, but I get a size mismatch error when performing a forward pass (train or eval, doesn’t matter). Here is the output:

Exception NameError: “global name ‘FileNotFoundError’ is not defined” in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fcbc02c3350>> ignored
Traceback (most recent call last):
File “train_triplet_loss.py”, line 169, in
outputs = F.softmax(old_model(images))
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/linear.py”, line 55, in forward
return F.linear(input, self.weight, self.bias)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/functional.py”, line 994, in linear
output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [72 x 2], m2: [144 x 100] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

If I completely strip away the linear layers, and just leave the conv layers, there is no size mismatch error. Keep in mind, I am using the same exact data loader that I used to train the network in the first place. This isn’t a huge deal because I plan to strip away the linear layers regardless, but I would like to verify that the frozen model performs as it was trained. I have a feeling that this may have something to do with the fact that I use “.view(-1, 144)” to flatten my final feature map in the forward method before the first linear layer, which is where this error is occurring.

ptrblck · July 18, 2018, 8:40am

Did you change the view() after loading the model?
If you model was fine before you saved it, it should also work after loading it.
The shape of m1 looks like it could be [144 x 1], but this is just a guess.

Could you explain a bit more how you saved and reloaded the model?
Also, the code would be interesting to see.

zswartz · July 18, 2018, 6:59pm

Hey! Thanks for the reply.

I use torch.save(model.sate_dict, filename) to save, and then I use:
model = train.Net()
model.load_state_dict(torch.load(filename))
in order to load the model.
I think the problem has something to do with the fact that I am using:
old_model = nn.Sequential(*list(model.children())).cuda(),

after loading the model.

I have done it this way so I have the ability to create a sub-network by indexing model.children(), which works as long as I index up to but not including the first linear layer.

I’m not sure how much liberty I have in sharing all of the code, but I can include snippets.

Thank you!

ptrblck · July 18, 2018, 7:03pm

Could you check again, that the forward pass runs successfully:

x = torch.randn( YOUR_SIZE )
output = model(x)
torch.save(model.state_dict(), filename)
model = train.Net()
model.load_state_dict(torch.load(filename))
output = model(x)
old_model = nn.Sequential(...)
x = x.to('cuda')
output = old_model(x)

zswartz · July 18, 2018, 8:26pm

So, the model runs perfectly as long as I don’t stick the layers together using nn.Sequential(…).

…Any ideas?

zswartz · July 18, 2018, 8:30pm

I’m under the impression that using sequential might mess with the flattening of the final feature map.

ptrblck · July 18, 2018, 8:37pm

Thanks for the hint! You are absolutely right.

You are re-creating the model as a nn.Sequential module, so that the view you are probably using in forward will be missing.

You can fix using a Flatten module between your layers. Here is a small example:

class Flatten(nn.Module):
    def __init__(self):
        super(Flatten, self).__init__()
        
    def forward(self, x):
        return x.view(x.size(0), -1)


class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 3, 1, 1)
        self.fc1 = nn.Linear(6*24*24, 10)
        self.fc2 = nn.Linear(10, 2)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

    
model = MyModel()
x = torch.randn(1, 3, 24, 24)
output = model(x)

layers = list(model.children())[:1] + [Flatten()] + list(model.children())[1:]
model = nn.Sequential(*layers)
output = model(x)

zswartz · July 18, 2018, 8:38pm

Awesome! Sorry, did you mean to include an instance of your Flatten class somewhere in your model?

zswartz · July 18, 2018, 8:39pm

Sorry, I didnt read all of it. Thank you!

zswartz · July 31, 2018, 2:31pm

Hey, sorry to resurface this issue, but I have a lingering question. Does stitching things together using sequential ignore everything that occurs in the forward method, including functional relus?

justusschock · July 31, 2018, 2:37pm

Yes it does. You can only use instances of classes to give them to the sequential model. For almost every function you can simply wrap it inside a torch module (and for some methods as Relu such a wrapper exists already).

zswartz · July 31, 2018, 2:44pm

so, If I’m removing and adding layers of a saved model by using nn.sequential, how would I reintroduce the relu’s??

justusschock · July 31, 2018, 2:45pm

You could simply add a torch.nn.ReLU() layer to use relu inside your sequential model.