Dear pytorch community,
I noticed that the downsample method used for the resnet networks works with stride 2 convolutions. That is fine, but what worries me is that kernel_size
is set to… 1!
Either kernel_size=1 or stride=2 would be okay, but together… Doesn’t that skip most of the image? My understanding is that kernel_size=1 and stride=2 looks something like:
xoxo
oooo ...
xoxo
oooo
...
where all the o’s don’t matter at all, so 3/4 of the image doesn’t matter. I know max_pooling also discards information, but it looks at it and decides which ones to discard. This doesn’t even look at the pixels in ‘o’ positions.
If you don’t believe me, do this:
import torchvision
torchvision.models.resnet18()
and you get a big description, including lines like:
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)