Confusing Default Stride in Pooling Layers

magz · April 26, 2018, 7:53am

Hey everyone!

I spent two hours of my life yesterday tracking down a weird “bug”.
I trained a network on image patches to generate descriptors for a given patch size, i.e. the receptive field of my network should correspond to the patch size and in the end, I get a descriptor for this patch: for example I have a 25x25x(1chan) patch, and the output would be a 1x1x(descriptor size = chan) tensor.
I implemented all this in a fully convolutional manner, so I expected it to work as a dense feature descriptor, if I put images with a size larger than 25x25 into my net. So far everything went fine.
However, in my architecture, I employ some Pooling layers. For all of them - EXCEPT ONE … - I explicitly stated the kernel size and the stride (e.g. k=5, s = 2). So in the end, I wondered, why I got a different effective stride than what I was expecting (effective stride = s1 x s2 x … ). And looking two hours into my code, I found a very embarrassing mistake that I made: for the one Pooling layer, where I didn’t explicitly stated the stride, I had a kernel_size of s = 3. And by default, the stride is set to kernel_size. This was completely unexpected for me, since I was used to the convolution syntax, where the stride=1 by default.
This behaviour is correctly documented, but at least for myself, it is pretty unexpected.

What do you think about this? Should the default value maybe set to 1? Did anyone else encounter this “problem”?

ptrblck · April 26, 2018, 8:00am

The default makes sense to me, since most pooling operations don’t use overlapping windows.
Are you trying to re-implement a paper where this is used? Could you post some sources?

magz · April 26, 2018, 8:07am

I stumbled about this thread, where they are referencing the AlexNet paper with “We generally observe during training that models with overlapping pooling find it slightly more difficult to overfit”. I just wanted to give it a quick shot in my architecture, since it seems to be some kind of “forgotten” or no longer used design choice. And I wondered if it might be beneficial.

ptrblck · April 26, 2018, 9:34am

Sounds good! Give it a go and see, if it really helps.
As you said the design choices change sometimes, e.g. using strided convolutions instead of pooling layers.