2D dilated convolution on 1x1xC tensor

jmontoyaz · February 26, 2021, 10:19pm

I’m trying to understand the lines below:

class MyModule(nn.Module):
    def __init__(self, in_channels, out channels):
        super(MyModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.conv3_3 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=3, dilation=3)
    def forward(self, x):
        x_y = self.avg_pool(x)
        x_c = self.conv3_3(x_y)
        return x_c

which translates into: CONV3_DIL3(AVG_POOL1(X)). Now:

Imagine that X.shape = [1, 64, 16, 16], i.e. BxCxHxW.
After applying the X_Y=AVG_POOL1(X), we obtain X_Y.shape = [1, 64, 1, 1] => 1xCx1x1
Finally, after applying X_C= CONV3_DIL3(X_Y) we obtain X_Y.shape = [1, 64, 1, 1] => 1xCx1x1 because of the padding.

Now, my questions are: (i) what does it exactly mean to apply a 2D Conv with a 3x3 filter with dilation=3 into a 1xCx1x1 tensor, especially because there’s no spatial information in a flat 1xCx1x1 tensor (ii) why does this make sense at all?

ptrblck · February 28, 2021, 9:07am

I don’t think the larger kernel size, padding, and dilation would make any difference than applying a 1x1 kernel using the center kernel pixel.
Here is a small code snippet to show the equivalence:

# setup
x = torch.randn(1, 3, 24, 24)
avg_pool = nn.AdaptiveAvgPool2d(1)
conv3_3 = nn.Conv2d(3, 1, kernel_size=3, stride=1, padding=3, dilation=3)

# reference calculation
out_ref = conv3_3(avg_pool(x))

# using a kernel size of 1
conv_new = nn.Conv2d(3, 1, kernel_size=1)
with torch.no_grad():
    conv_new.weight.copy_(conv3_3.weight[:, :, 1:2, 1:2])
    conv_new.bias.copy_(conv3_3.bias)
out_new = conv_new(avg_pool(x))

# compare
print(torch.allclose(out_ref, out_new))
> True
print((out_ref - out_new).abs().max())
> tensor(0., grad_fn=<MaxBackward1>)