How does strided convolution work?

conrad · December 13, 2020, 10:40pm

This is an extremely basic question, but I can’t find an answer anywhere. Let’s say I take Conv2d(nin, nout, kernel_size=1, stride=2). What are the coordinates of the pixels that get the convolution operation applied? Does it start at the top left with coordinate (0, 0) and then move to (0, 2)? Or does it start at coordinate (1, 1) and then move to (1, 3)?

JamesDickens · December 13, 2020, 10:51pm

I believe it would start at (0,0) and then move to (0,2). You can check this out by creating your own conv2d layer, inputting a simple array like [[1,2,3], [4,5,6], [7,8,9] to torch.tensor as your input image (so 1 channel in and 1 channel out), and then setting the weight of the conv layer to be all 1s and its bias to be 0, and then observing the result, which I believe should be [[1,3], [7, 9]].

conrad · December 13, 2020, 11:00pm

That works. I confirmed that it does start at (0, 0).

Thanks for the help!