# 2D dilated convolution on 1x1xC tensor

I’m trying to understand the lines below:

class MyModule(nn.Module):
def __init__(self, in_channels, out channels):
super(MyModule, self).__init__()
self.conv3_3 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=3, dilation=3)
def forward(self, x):
x_y = self.avg_pool(x)
x_c = self.conv3_3(x_y)
return x_c

which translates into: CONV3_DIL3(AVG_POOL1(X)). Now:

• Imagine that X.shape = [1, 64, 16, 16], i.e. BxCxHxW.
• After applying the X_Y=AVG_POOL1(X), we obtain X_Y.shape = [1, 64, 1, 1] => 1xCx1x1
• Finally, after applying X_C= CONV3_DIL3(X_Y) we obtain X_Y.shape = [1, 64, 1, 1] => 1xCx1x1 because of the padding.

Now, my questions are: (i) what does it exactly mean to apply a 2D Conv with a 3x3 filter with dilation=3 into a 1xCx1x1 tensor, especially because there’s no spatial information in a flat 1xCx1x1 tensor (ii) why does this make sense at all?

I don’t think the larger kernel size, padding, and dilation would make any difference than applying a 1x1 kernel using the center kernel pixel.
Here is a small code snippet to show the equivalence:

# setup
x = torch.randn(1, 3, 24, 24)
conv3_3 = nn.Conv2d(3, 1, kernel_size=3, stride=1, padding=3, dilation=3)

# reference calculation
out_ref = conv3_3(avg_pool(x))

# using a kernel size of 1
conv_new = nn.Conv2d(3, 1, kernel_size=1)