How does strided convolution work?

This is an extremely basic question, but I can’t find an answer anywhere. Let’s say I take Conv2d(nin, nout, kernel_size=1, stride=2). What are the coordinates of the pixels that get the convolution operation applied? Does it start at the top left with coordinate (0, 0) and then move to (0, 2)? Or does it start at coordinate (1, 1) and then move to (1, 3)?

I believe it would start at (0,0) and then move to (0,2). You can check this out by creating your own conv2d layer, inputting a simple array like [[1,2,3], [4,5,6], [7,8,9] to torch.tensor as your input image (so 1 channel in and 1 channel out), and then setting the weight of the conv layer to be all 1s and its bias to be 0, and then observing the result, which I believe should be [[1,3], [7, 9]].

1 Like

That works. I confirmed that it does start at (0, 0).

Thanks for the help!

1 Like