CNN: array dimensions from convolution make no sense to me

Hi. I have a tensor which is made up of 8 matrices, and each matrix is 20x20 numbers. I am trying to do a convolution on this matrix but I don’t understand the output dimensions. Here is what I have so far (the x is my tensor):

class CNN(nn.Module):

def __init__(self, n_filters, filter_sizes):

    super(CNN, self).__init__()

    self.conv0 = nn.Conv2d(in_channels=1, out_channels=n_filters[0], kernel_size=filter_sizes[0])

def forward(self, x, lengths=None):
    x = x.unsqueeze(1)

    map0_conv = F.relu(self.conv0(x))
    map0_avg = torch.mean(map0_conv, dim=1)

My filter size is 2x2, and the stride is 1. Therefore, for each matrix I need 19*19 = 361 filters. Thus, starting with an 8x20x20 tensor, I end up with the dimensions of self.conv0(x) being 8x361x19x19.

I don’t know what this result means. It’s causing me problems in the next step, the torch.mean. What I’m trying to do, is to take a 8x20x20 tensor, apply a convolution to it and then get the average of each convoluted matrix to end up with one vector containing 8 entries. Clearly I’m doing it wrong but I don’t know how to fix it.


So, when you are doing x.unsqueeze(1), you are basically passing an input size as (8,1,20,20). Now, the output dimensions of your conv operation depends on the number of out_channels you pass in.
so, let’s say you pass in out_channels=1, then you end up with an output size of (8,1,19,19). [19 because you have a kernel of size 2x2 and stride as 1]. In short, the output size (8, x, 19, 19) where x is a variable depending on the number of out_channels you pass.

Finally, for the mean to be a vector of 8 entries, all you need to do is calculate the average across axis=1,2,3.

map0_avg = torch.mean(map0_conv, dim=[1,2,3])

the results would be a vector of size 8.