How do I feed higher res images to model zoo models?

Halp! I am trying to feed 512x512 and higher res images to ResNet models in the TorchVision zoo but it keeps throwing size_mismatch errors at me.

Can someone please explain what needs to be changed in following model definition to feed it 512x512 or 1024x1024 dimensions? I am new from Keras and a bit lost here. Thank you!

ResNet model definition:
The error seems to be occurring when executing line 151, in forward x = self.fc(x)

https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py

The size mismatch is coming from the output of the last avgpool layer feeding into the input of the resnet.fc module. To feed it larger images you will need to strip the existing fully connected layer and replace it with one with a larger input dimension.

something like:

your_net = models.resnet18(pretrained=True)`

new_num_features = *something bigger than 512*

your_net.fc = nn.Linear(new_num_features, 1000)

This will create untrained parameters, though. To use the pretrained models, I think you are restricted to image sizes below what you are using.

the easiest way that I’ve found to deal with this is to temporarily modify the resnet.py file to print x.size() right before the x = self.fc(x)

this will show you the dimensions of the tensor that is created by whatever images you are using

1 Like

Thank you @tymokvo - that was exactly the pointer I needed! I have a high-res test working smoothly now.

:raised_hands:

1 Like

Hello,

The CNN models like resnet, vgg and alexnet in pytorch are pretrained on which dataset?

1 Like

Would you be willing to share your code to do that?

Thanks

It should be possible to change the last pooling layer so that it makes sure that the ouput is of the correct size. See https://github.com/pytorch/vision/issues/140

Imagenet. https://github.com/pytorch/vision#models

1 Like

@ tymokvo
@ FuriouslyCurious

I’d like to train the Inceptionv3 model with high-resolution images like 512x512. Do you know how to do it?