# 2D dilated convolution on 1x1xC tensor

I’m trying to understand the lines below:

``````class MyModule(nn.Module):
def __init__(self, in_channels, out channels):
super(MyModule, self).__init__()
self.conv3_3 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=3, dilation=3)
def forward(self, x):
x_y = self.avg_pool(x)
x_c = self.conv3_3(x_y)
return x_c
``````

which translates into: `CONV3_DIL3(AVG_POOL1(X))`. Now:

• Imagine that `X.shape = [1, 64, 16, 16]`, i.e. `BxCxHxW`.
• After applying the `X_Y=AVG_POOL1(X)`, we obtain `X_Y.shape = [1, 64, 1, 1]` => `1xCx1x1`
• Finally, after applying `X_C= CONV3_DIL3(X_Y)` we obtain `X_Y.shape = [1, 64, 1, 1]` => `1xCx1x1` because of the padding.

Now, my questions are: (i) what does it exactly mean to apply a 2D Conv with a `3x3` filter with `dilation=3` into a `1xCx1x1` tensor, especially because there’s no spatial information in a flat `1xCx1x1` tensor (ii) why does this make sense at all?

I don’t think the larger kernel size, padding, and dilation would make any difference than applying a `1x1` kernel using the center kernel pixel.
Here is a small code snippet to show the equivalence:

``````# setup
x = torch.randn(1, 3, 24, 24)
conv3_3 = nn.Conv2d(3, 1, kernel_size=3, stride=1, padding=3, dilation=3)

# reference calculation
out_ref = conv3_3(avg_pool(x))

# using a kernel size of 1
conv_new = nn.Conv2d(3, 1, kernel_size=1)