Concatenate CNN AvgPool with Embedding layer

chrome · July 8, 2019, 6:29am

I have an embedding layer with each sentence having a length of 20 and the dimension as 16. I pass this through a 1D convolutional layer -> relu -> avgPool and the output dimension of avgPool is [128, 10, 10] where 128 is the batch size. Now I want to know how to concatenate this avgPool with the embedding with dimension [128, 20, 16] so that I can pass this to the next CNN layer. I have been stuck with the error invalid argument 0: Sizes of tensors must match except in dimension 0. Got 20 and 10 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1307.

self.c_1 = nn.Conv1d(in_channels=self.embedding_dim, out_channels=10, kernel_size=2,
                             padding=1)
self.relu_1 = nn.ReLU()
self.avg_pool_1 = nn.AvgPool1d(2)

ptrblck · July 8, 2019, 11:02am

As the error message states you cannot concatenate these tensors, as two dimensions have different sizes.
As far as I understand you would like to create something like a residual connection, where you pass the output of the embedding layer to this conv -> avg_pool block and try to concatenate it with the input.

If that’s the case and your embedding output has a shape of [128, 20, 16], the output shape of the conv-block will be torch.Size([128, 10, 8]):


x = torch.randn(128, 20, 16)
c_1 = nn.Conv1d(in_channels=20, out_channels=10, kernel_size=2,
                             padding=1)
avg_pool_1 = nn.AvgPool1d(2)
output = c_1(x)
output = avg_pool_1(output)

Could you explain your use case a bit, as I’m not sure how you would like to concatenate these tensors now?

chrome · July 9, 2019, 11:59am

My goal is to concatenate my sentence embedding with shape [128, 20, 16] with the output of the averagepool with shape [128, 10, 10] at position 2 which is 10 + 16. The output of this concatenation is given as input to the next Conv1d layer.

torch.cat((embed.view(-1, sent.shape[1], embed.shape[2]), avg_pool_level_1), 2)

where shape of embed is torch.Size([128, 20, 16]) and shape of avg_pool_1 is torch.Size([128, 10, 10])

ptrblck · July 9, 2019, 9:03pm

Concatenating these tensors won’t work unfortunately, as the sizes of two dimensions are different.
You could pad one dimension (e.g. dim1) and concatenate it in dim2.

chrome · July 16, 2019, 9:15am

Thanks @ptrblck. I wanted to know if there is a way I can concatenate the input word embedding with the output of CNN-POOL layer considering that the dimensions keep reducing depending on the size of the POOL layer. Any technique that can be followed for this?

ptrblck · July 16, 2019, 1:08pm

For a 1-dimensional signal, you could try to match the number of channels and concatenate in dim2.
To do this just set out_channels=20 in your conv layer.

chrome · July 17, 2019, 5:45am

Cool thanks! Also in pytorch how can we add the dimensions of a FC layer which comes after avgpool if I don’t know what the output dimension of avgpool will be? Like if the dimensions of avgpool is [128, 10, 57] and FC comes after this then initially I won’t be able to know the value 57 as the first dimension unless I execute the code once. Here 10 is the out_channel and 114 was the sequence length of each sentence which halved due to avgpool.

ptrblck · July 17, 2019, 10:12am

You could calculate the output shapes looking at the formulas in the docs.
If you don’t want to do that, you could just run a single iteration, add some print statements to show the output shapes, and change the number of input features accordingly.

However, if you are dealing with variable sized inputs, I would recommend to use an adaptive pooling layer, which will output a defined shape.

chrome · July 17, 2019, 10:22am

Yes I had so far used the single iteration and printing the shapes and modifying accordingly.