I’m trying to understand the lines below:
class MyModule(nn.Module):
def __init__(self, in_channels, out channels):
super(MyModule, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv3_3 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=3, dilation=3)
def forward(self, x):
x_y = self.avg_pool(x)
x_c = self.conv3_3(x_y)
return x_c
which translates into: CONV3_DIL3(AVG_POOL1(X))
. Now:
- Imagine that
X.shape = [1, 64, 16, 16]
, i.e.BxCxHxW
. - After applying the
X_Y=AVG_POOL1(X)
, we obtainX_Y.shape = [1, 64, 1, 1]
=>1xCx1x1
- Finally, after applying
X_C= CONV3_DIL3(X_Y)
we obtainX_Y.shape = [1, 64, 1, 1]
=>1xCx1x1
because of the padding.
Now, my questions are: (i) what does it exactly mean to apply a 2D Conv with a 3x3
filter with dilation=3
into a 1xCx1x1
tensor, especially because there’s no spatial information in a flat 1xCx1x1
tensor (ii) why does this make sense at all?