What do I miss about my tensor size?

string111 · May 24, 2018, 4:27pm

Heyho, me again :D!
I seem to miss something about my networks tensor size

This is my network (VGG16 with removed classifier and an added ConvNet):

Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (1): Conv2d(512, 169, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
)

The tensor-size of the output is torch.Size([20, 169, 15, 15], shouldn’t it be torch.Size([20, 169]?
I think the 15x15 has something to do with the maxpooling and the original image size. My image-input-size is torch.Size([20, 3, 416, 416], my calculations with this formula:
result in a 13x13 final receptive field and not a 15x15 field.

Thanks in advance!

Raph

EDIT:
20 is my batchsize.

ptrblck · May 24, 2018, 6:57pm

The output size is defined by your conv and max pool layers.
Your last Conv2d layer has a kernel size of 1, and padding=1, which adds 2 pixels to the height and width of your activation volume.
So if your activation from layer30 is [batch_size, 512, 13, 13], your final layer will result in [batch_size, 169, 15, 15].

x = torch.randn(1, 512, 13, 13)
m = nn.Conv2d(in_channels=512,
              out_channels=169,
              kernel_size=1,
              stride=1,
              padding=1)
x = m(x)
print(x.shape)
> torch.Size([1, 169, 15, 15])

If you want the output to be [batch_size, 169, 1, 1], you should use a conv layer with kernel_size=13, stride=1, padding=0.

string111 · May 25, 2018, 7:19am

Thanks, that was my mistake, you’re great thanks!