nn.Conv2D output shape

I am confused by the nn.Conv2D output shape. My hand calculations don’t match what the documentation says the output shape should be.

Take an input FloatTensor x with torch.Size = [10, 512, 8, 16] in NCHW order. If:

  • in channels cin = 512
  • out channels cout = 128
  • number of filters f = 128
  • size of filters k = 1
  • stride s = 1
  • padding p = 1
  • height in h = 8
  • dilation d = 1

The API says that the height out should be

floor((h + 2*p - d *(k-1) -1 )/(s+1))
= floor((8 + 2 - 0 -1)/2)
= 4

but when I run the code height out = 10.

What am I missing?

Noting is weird here. In your formula, you divided by (s+1) where in the doc, the division is by s and then adding 1 to it. The correct formula would be:
floor((h + 2*p - d *(k-1) -1 )/s + 1) = floor((8 + 2 - 0 -1)/1 + 1) = 10

1 Like