The pre-trained model in pytorch is different from Keras

In the forward pass of applying a pre-trained model from torchvision to classify an image.
The zero padding in the convolutional layer will lead to that the values of boundaries positions of the feature maps become much large than the values on other positions.
In other words, the classification score mainly depends on the boundary pixels of images instead of object in the images. In other pre-trained models, e.g., from Keras, it is not the case.
What do you think about it? Am I wrong?


Why the bordel values will be much bigger? Since it is padded by 0s, you just add 0s to the result.
Also the pre-trained models follow the specifications from the papers that presented them. So If the original paper had padding, pytorch model will have padding. And I expect that Keras models will have padding as well.

Sure, the models are same as the models in original paper or in other frameworks (e.g., Keras). However, our Pytorch has a difference data preprocess. Namely, load a image with a range of [0, 1] and then normalise it using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. After preprocessing, the values are in range of [-2.1179, 2.64].
If the feature maps are padded by 0s, passed into a convolutional layer, the boundary pixels of new feature maps could become bigger since the feature maps or weights could be negative. The easy way to check this is to look into max_pool2D indices (especially in low layers), you will find that a lot of indices are on the boundary of feature maps.
It has nothing to do with the model accuracy. I mean that the classification score is contributed by the large boundary values partially. That is not what we expect. Look forward to your reply:grinning:!

If you assume that the input values are centered at 0 (which is what the renormalization is doing), then you have ~ as many positive and negative values. So replacing them by 0 shouldn’t change much.

The max pooling is done on a 2x2 patch and convolution in general with 3x3 kernel and 1 padding.
So you mean that for example if we look at a patch on the left side of the image, the two values on the left are more often bigger than the ones on the right? Would you have code to check that? That is interesting.

Exactly! My research interest is Explainable Deep Learning. Recently, I start to move to Pytorch. I implement some backpropagation approaches. I find the issue. In Keras, it is not the case. My project code is too long to read. You can just load a pre-trained model and add a forward hook function for the max_pool2D layer (better the first one), in the function you have feature_input and the parameters for Pooling, just recalculate it with return of the indices, and use unpool with the indices and the output. You can just print the unpooling result. you will find that the values are often lie on the boundary.

I am not sure to understand why Keras models don’t have this issue given that they are supposed to implement exactly the same thing. What is the implementation difference that makes Keras not have this behaviour?

Actually, I do not know exactly why keras does not the issue. I assume that the only difference is the data preprocessing. E.g. in inception v3, in Keras the preprocessed range are [-1, 1], instead pytorch [-2.1179, 2.64]. In the training process, even though both have the same optimization strategy, they could end up with different parameters. Hence, they could have different effects on the issue. It is not a systematic error, just some randomness. That is my personal opinion!