Different predicted output size with pretrained model on caffe and torchvision model

I am working on converting a legacy torch(lua based) project to pytorch. The authors of the legacy code have used a vgg16 pretrained model (on caffe framework) for their experiments. I used the torchvision pretrained model. I wanted to pass a 28x28x3 image through said models feature extractor but I am getting different output tensor sizes when passing the image through these. The caffe model returns a 512x2x2 (CHW) tensor whereas torchvision one returns a 512x1x1. I printed the model submodules and they exactly match each other. What should I do?

@emross3371 looks like you are missing some operation pre or post VGG. If you provide the source code then we can check

VisualSearchZeroShot/IVSNtopdown_30_31_array.lua at master · kreimanlab/VisualSearchZeroShot · GitHub this is the code I’m trying to modify. This is the output of the pytorch sub model https://imgur.com/NJh5iC7, and this is the output of the lua code( will post this as a reply in the thread). Funny thing is when i pass the standard 224x224x3 input the pytorch and caffe model work fine, it only outputs different sizes for the smaller sized image of 28x28x3

https://imgur.com/9MWr4sl - model submodules in caffe model

As far as I can see, only simple mean subtraction preprocessing is done, pls correct me if im wrong (I am new to legacy torch)

@emross3371 This can be explained. If you have very small images then during the later layers, you dont have enough pixels left to do complete convolution and therefore shape is reduced to 1. Every image processing model should accept a particular shape of the image. If 224 is working fine, then resize the 28x28x3 to 224x224x3 using torch Resize

https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.Resize