Parameter number of convolutional layer right after pooling layer very high compared to other layers

Hi everyone, I have this following model created by using nn.Sequential, however when I used torchsummary, I noticed that right after the pooling layer, the number of parameters is very high compared to other layers. I don’t know why it is that way. Here is the output:

Sequential(
  (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=same)
  (1): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1), padding=same, dilation=(7, 7))
  (2): Conv2d(16, 64, kernel_size=(1, 1), stride=(1, 1), padding=same, dilation=(3, 3))
  (3): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), padding=same, dilation=(5, 5))
  (4): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), padding=same, dilation=(3, 3))
  (5): MaxPool2d(kernel_size=3, stride=3, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(64, 64, kernel_size=(7, 7), stride=(1, 1), padding=same, dilation=(7, 7))
  (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 256, 256]           1,792
            Conv2d-2         [-1, 16, 256, 256]           1,040
            Conv2d-3         [-1, 64, 256, 256]           1,088
            Conv2d-4         [-1, 64, 256, 256]           4,160
            Conv2d-5         [-1, 64, 256, 256]           4,160
         MaxPool2d-6           [-1, 64, 85, 85]               0
            Conv2d-7           [-1, 64, 85, 85]         200,768
       BatchNorm2d-8           [-1, 64, 85, 85]             128
================================================================
Total params: 213,136
Trainable params: 213,136
Non-trainable params: 0

This is expected as the next conv layer has 64*64*7*7=200704 (+64 bias) trainable weights.

Oh i understand now, it’s because of

kernel_size=(7, 7)

and

dilation=(7, 7)

The dilation does not change the parameter count.
It’s thus because of the kernel size and the in/out channels of the kernel.

1 Like