I spent two hours of my life yesterday tracking down a weird “bug”.
I trained a network on image patches to generate descriptors for a given patch size, i.e. the receptive field of my network should correspond to the patch size and in the end, I get a descriptor for this patch: for example I have a 25x25x(1chan) patch, and the output would be a 1x1x(descriptor size = chan) tensor.
I implemented all this in a fully convolutional manner, so I expected it to work as a dense feature descriptor, if I put images with a size larger than 25x25 into my net. So far everything went fine.
However, in my architecture, I employ some Pooling layers. For all of them - EXCEPT ONE … - I explicitly stated the kernel size and the stride (e.g. k=5, s = 2). So in the end, I wondered, why I got a different effective stride than what I was expecting (effective stride = s1 x s2 x … ). And looking two hours into my code, I found a very embarrassing mistake that I made: for the one Pooling layer, where I didn’t explicitly stated the stride, I had a kernel_size of s = 3. And by default, the stride is set to kernel_size. This was completely unexpected for me, since I was used to the convolution syntax, where the stride=1 by default.
This behaviour is correctly documented, but at least for myself, it is pretty unexpected.
What do you think about this? Should the default value maybe set to 1? Did anyone else encounter this “problem”?