How to provide the textual input for convolution

Hi, I would like to apply a convolution over the embeddings of words of a sentence.

For example let’s say I have a sentence with 5 words. My embedding size is 128. Then my input shape is 5x128. I would like to convolve this input with a convolution whose output channel is 4 and input channel is 1 and kernel is the size of 2x128 (I guess convolution used in this way with textual input right ? )

How can I do it in pytorch? Is it the correct way to do it? My aim is to get a combination of the word representations of a sentence

x = torch.rand(5,128)
w = nn.conv2d(1,2,128,4)

I’m not sure how to use convolutions with textual input, but based on your description, I assume you would like to use the 5 different words as one spatial dimension and the embedding size as the other.
If that’s the case, you could try the following:

x = torch.randn(1, 1, 5, 128) # batch_size, channels, height, width
conv = nn.Conv2d(1, 4, (2, 128))
output = conv(x)

Hi @ptrblck

Sorry for late response. Actually it is not what I want to do.

I put the image that visualize the model in my mind (together with the link for paper):


I think I made it but I am not sure. Could you review it if possible. Let’s assume we have 100 unique words in our vocabulary, and the sentence I am interested in contains the first 20 of them. My embedding vectors are 300d and I will consider bi-grams and use 4 filters:

import torch
import torch.nn as nn
from random import randint

e = nn.Embedding(100,300)
ic = 1;oc =4;h=2;w=300
w = nn.Conv2d(ic,oc,(h,w))
embeds = e(torch.tensor([randint(0,99) for i in range(20)],dtype=torch.long))
embeds2 = embeds.view(1,1,20,300) # reshape it so that I can use it as input to convolution
conv_out = w(embeds2)
torch.Size([1, 4, 19, 1])

If I am correct until that point, I also wonder how could I perform the pooling operation. Basically I would like to take the maximum number in each of 1x19 feature vectors. Should I use pooling2d or pooling1d for that ? I couldn’t understand which one is the correct choice for me from the docs.

If you want to use the current shape with the training 1 for the width, you can use nn.MaxPool2d with a kernel_size of (19, 1):

nn.MaxPool2d(kernel_size=(19, 1), stride=1)