How can torchvison.models deal with image whose size is not 224*224?

681408ed5ec9ed11af98 · July 19, 2019, 2:06pm

Hi!
RestNet, DenseNet and other model in torchvison.model are all designed for image of size 224x224?
How can I use it to other image tasks with image size not 224x224?.
These days I’m trying to use torchvison.models for cifar10 classificaion task, and I already try some methods to overcome the image size mismatch problem：

resize the image size to 224x224. but i found that it will cost more times for training than using a image size of 32x32.
In ResNet model, Ichange the avgpool to adaptiveavgpool, like that model.avgpool = t.nn.AdaptiveAvgPool2d(1). But I found that it does’t obtain a good performance, the best validation acc is just 84%. I found somebody can reach 92%. So whats’s the problem with this method?

So, I want to know other better methods to deal with the problem! I’ll be really grateful for any advice or help.

ptrblck · July 20, 2019, 12:05am

A lot of pretrained models will use the image size of 224x224, so you might want to use a custom architecture for smaller images (or resize them as you’ve already done).

The current resnet implementation already uses an adaptive pooling layer, so your change shouldn’t make a difference regarding the performance.

681408ed5ec9ed11af98 · July 20, 2019, 7:35am

Thank you for you advice!
In addition，I found that by resizing image size and then use the pretrained resnet18 it can achieve about 94% accuracy in cifar10. But using a adaptive pooling layer the best performance is only 84%. So why they are so different?

justusschock · July 20, 2019, 5:34pm

Simply spoken:

The difference here is, that the kernels have been trained to a specific image size (224) and thus the features they usually extract are usually of a certain size.

If you use smaller images, the kernels might not be able to extract the features with the usual size, since they are smaller (ore larger), which may result in a difference in performance.

If your input size varies to much from 224x224 pixels the results may even become much worse.

681408ed5ec9ed11af98 · July 21, 2019, 1:05pm

Thank you very much! I understand what u said.