Hey guys, I am trying to load a trained CNN classifier that I saved so I can modify the linear layers, but I get a size mismatch error when performing a forward pass (train or eval, doesn’t matter). Here is the output:
Exception NameError: “global name ‘FileNotFoundError’ is not defined” in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fcbc02c3350>> ignored
Traceback (most recent call last):
File “train_triplet_loss.py”, line 169, in
outputs = F.softmax(old_model(images))
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/container.py”, line 91, in forward
input = module(input)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/module.py”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/modules/linear.py”, line 55, in forward
return F.linear(input, self.weight, self.bias)
File “/home/zswartz/.local/lib/python2.7/site-packages/torch/nn/functional.py”, line 994, in linear
output = input.matmul(weight.t())
RuntimeError: size mismatch, m1: [72 x 2], m2: [144 x 100] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249
If I completely strip away the linear layers, and just leave the conv layers, there is no size mismatch error. Keep in mind, I am using the same exact data loader that I used to train the network in the first place. This isn’t a huge deal because I plan to strip away the linear layers regardless, but I would like to verify that the frozen model performs as it was trained. I have a feeling that this may have something to do with the fact that I use “.view(-1, 144)” to flatten my final feature map in the forward method before the first linear layer, which is where this error is occurring.
Did you change the view() after loading the model?
If you model was fine before you saved it, it should also work after loading it.
The shape of m1 looks like it could be [144 x 1], but this is just a guess.
Could you explain a bit more how you saved and reloaded the model?
Also, the code would be interesting to see.
I use torch.save(model.sate_dict, filename) to save, and then I use:
model = train.Net()
model.load_state_dict(torch.load(filename))
in order to load the model.
I think the problem has something to do with the fact that I am using:
old_model = nn.Sequential(*list(model.children())).cuda(),
after loading the model.
I have done it this way so I have the ability to create a sub-network by indexing model.children(), which works as long as I index up to but not including the first linear layer.
I’m not sure how much liberty I have in sharing all of the code, but I can include snippets.
Hey, sorry to resurface this issue, but I have a lingering question. Does stitching things together using sequential ignore everything that occurs in the forward method, including functional relus?
Yes it does. You can only use instances of classes to give them to the sequential model. For almost every function you can simply wrap it inside a torch module (and for some methods as Relu such a wrapper exists already).