Inconsistency between caffe and pytorch for max-pooling

I have a layer using 3x3 max pooling layer in caffe

layer {
  name: "pool"
  type: "Pooling"
  bottom: "conv1"
  top: "pool"
  pooling_param {
    pool: MAX 
    kernel_size: 3
    stride: 2
    engine: CUDNN

I tried to convert it to pytorch by

self.pool= nn.MaxPool3d(kernel_size=3, stride=2)

For input feature size of torch.Size([1, 64, 32, 32, 32]), the caffe output size is 1, 64, 16, 16, 16, while the pytorch is 1,64,15,15,15. What is happen?

It may be caffe CUDNN requires kernel size 3 instead of 2. Am I right? It may be implementation in CUDNN issue

I found the answer here. Anyone can confirm to me is it correct to use the below code to perform same as pooling in caffe

self.pool = nn.MaxPool2d(kernel_size=3, padding=1, stride=2)

When implementing neural nets, particularly convnets, I always recommend writing down the dimensions you expect to get (e.g., via comments in code or goode olde pen & paper) and see if they match with what you actually get. This is really important for making sure the network is doing what you intend it to do.

That being said, your expected output size should be

(32 - 3) /2 + 1 = 15.5

I suppose Caffe is rounding up whereas PyTorch is rounding down.

PS: I would try to implement your nets such that you avoid fractions, because it’s ambiguous and is handled differently from framework to framework, which can cause a lot of ambiguity.

Thanks rasbt. Someone mentioned

Differences in implementation of Pooling - In keras, the half-windows are discarded. Caffe will put additional output for half-windows.
Differences in Padding schemes - The ‘same’ padding in keras can sometimes result in different padding values for top-bottom (or left-right). caffe always pad evenly on both sides so the top-bottom (or left-right) padding values are always equal.

This is another one for caffe and tensorflow. I guess the pytorch follows the rule of caffe

yeah, this sounds correct, it’s basically the “rounding up.” So is your goal to match the output of Caffe exactly or is either way fine with you and the question was more about seeing what’s going one? If the former is the case, to achieve this, you would need to create a custom padding layer (or just append a padding vector (1 pixel width) to the images before maxpooling. I suppose setting padding=1 won’t give you the Caffe results because it would add 1 pixel on each side.

Thanks for your explaination. Could you help me to write it? My expected output is 16x16 using kernel 3 and stride 2 with input size of 32x32. Thanks.

I think this is what you want :slight_smile:

Thanks. But I think caffe is pading both side, Am I right?

Yeah I think so, I am not a caffe expert thought but probably you need to add a padding of one to each side.

You can change your pytorch code to

self.pool= nn.MaxPool3d(kernel_size=3, stride=2, ceil_mode=True)

Since the default ceil_mode is False. It will ignore the last kernel when its size is smaller than the kernel size. While caffe use ceil mode.

1 Like