Concatenate CNN AvgPool with Embedding layer

I have an embedding layer with each sentence having a length of 20 and the dimension as 16. I pass this through a 1D convolutional layer -> relu -> avgPool and the output dimension of avgPool is [128, 10, 10] where 128 is the batch size. Now I want to know how to concatenate this avgPool with the embedding with dimension [128, 20, 16] so that I can pass this to the next CNN layer. I have been stuck with the error invalid argument 0: Sizes of tensors must match except in dimension 0. Got 20 and 10 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1307.

self.c_1 = nn.Conv1d(in_channels=self.embedding_dim, out_channels=10, kernel_size=2,
                             padding=1)
self.relu_1 = nn.ReLU()
self.avg_pool_1 = nn.AvgPool1d(2)

As the error message states you cannot concatenate these tensors, as two dimensions have different sizes.
As far as I understand you would like to create something like a residual connection, where you pass the output of the embedding layer to this conv -> avg_pool block and try to concatenate it with the input.

If that’s the case and your embedding output has a shape of [128, 20, 16], the output shape of the conv-block will be torch.Size([128, 10, 8]):


x = torch.randn(128, 20, 16)
c_1 = nn.Conv1d(in_channels=20, out_channels=10, kernel_size=2,
                             padding=1)
avg_pool_1 = nn.AvgPool1d(2)
output = c_1(x)
output = avg_pool_1(output)

Could you explain your use case a bit, as I’m not sure how you would like to concatenate these tensors now?

My goal is to concatenate my sentence embedding with shape [128, 20, 16] with the output of the averagepool with shape [128, 10, 10] at position 2 which is 10 + 16. The output of this concatenation is given as input to the next Conv1d layer.

torch.cat((embed.view(-1, sent.shape[1], embed.shape[2]), avg_pool_level_1), 2)

where shape of embed is torch.Size([128, 20, 16]) and shape of avg_pool_1 is torch.Size([128, 10, 10])

Concatenating these tensors won’t work unfortunately, as the sizes of two dimensions are different.
You could pad one dimension (e.g. dim1) and concatenate it in dim2.

Thanks @ptrblck. I wanted to know if there is a way I can concatenate the input word embedding with the output of CNN-POOL layer considering that the dimensions keep reducing depending on the size of the POOL layer. Any technique that can be followed for this?

For a 1-dimensional signal, you could try to match the number of channels and concatenate in dim2.
To do this just set out_channels=20 in your conv layer.

Cool thanks! Also in pytorch how can we add the dimensions of a FC layer which comes after avgpool if I don’t know what the output dimension of avgpool will be? Like if the dimensions of avgpool is [128, 10, 57] and FC comes after this then initially I won’t be able to know the value 57 as the first dimension unless I execute the code once. Here 10 is the out_channel and 114 was the sequence length of each sentence which halved due to avgpool.

You could calculate the output shapes looking at the formulas in the docs.
If you don’t want to do that, you could just run a single iteration, add some print statements to show the output shapes, and change the number of input features accordingly.

However, if you are dealing with variable sized inputs, I would recommend to use an adaptive pooling layer, which will output a defined shape.

Yes I had so far used the single iteration and printing the shapes and modifying accordingly.