Size mismatch error when running Transfer Learning tutorial

The default image size in the tutorial is 224x224,however,I went to try some other size(such as 225 or 256) in this tutorial.So I only modified the following section:

data_transforms = {
    'train': transforms.Compose([
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    'val': transforms.Compose([
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

But when running this demo again,I got these error message:

Traceback (most recent call last):
  File "C:/Users/Javis/PycharmProjects/pytorch_demo/", line 289, in <module>
  File "C:/Users/Javis/PycharmProjects/pytorch_demo/", line 187, in train_model
    outputs = model(inputs)
  File "C:\Users\Javis\AppData\Local\Continuum\Anaconda3\lib\site-packages\torch\nn\modules\", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Javis\AppData\Local\Continuum\Anaconda3\lib\site-packages\torchvision\models\", line 151, in forward
    x = self.fc(x)
  File "C:\Users\Javis\AppData\Local\Continuum\Anaconda3\lib\site-packages\torch\nn\modules\", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Javis\AppData\Local\Continuum\Anaconda3\lib\site-packages\torch\nn\modules\", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\Javis\AppData\Local\Continuum\Anaconda3\lib\site-packages\torch\nn\", line 835, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [4 x 2048], m2: [512 x 2] at c:\miniconda2\conda-bld\pytorch-cpu_1519449358620\work\torch\lib\th\generic/THTensorMath.c:1434

It 's very strange when I reset the img_size to 224 or less than 224,It became normal again. why I can not get a size more than 224 in this tutorual ?

Any hellp will be very appreciated !

Resnet is built to only take images of size 224x224.

Resnet is built of several layers of convolutions and poolings after which the image of three colours and size 224x224 is transformed into an “image” of 512 features and size 1x1. This is then fed into a Linear layer that needs an input with 512 features.

If you input an image that is bigger than 224x224 then the convolution and pooling layers will transform it into an “image” with 512 features that is bigger than size 1x1 and there will be too many features for the Linear layer. For example, if the convolution and pooling output is of size 2x2, then that would make 2048 inputs for the Linear layer.

If you input an image that is smaller than 224x224, then the average pooling layer will complain about its calculated output size being negative.

Basically Resnet can’t be used with images that aren’t 224x224 unless you crop or rescale them so that they are 224x224.

1 Like

I suppose you could replace the AvgPool2d layer with an AdaptiveAvgPool2d layer, like this…

loaded_resnet_model.avgpool = nn.AdaptiveAvgPool2d(1)

That would allow you to use different size images, but I can’t guarantee that the models performance will not be affected for different size images.

Thanks for your reply, I will try it.But I just confused that the torchvision documents

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224

I have proposed a correction to the documentation.