Hi, I would like to apply a convolution over the embeddings of words of a sentence.
For example let’s say I have a sentence with 5 words. My embedding size is 128. Then my input shape is 5x128. I would like to convolve this input with a convolution whose output channel is 4 and input channel is 1 and kernel is the size of 2x128 (I guess convolution used in this way with textual input right ? )
How can I do it in pytorch? Is it the correct way to do it? My aim is to get a combination of the word representations of a sentence
x = torch.rand(5,128)
w = nn.conv2d(1,2,128,4)
F.conv2d(x,w)
I’m not sure how to use convolutions with textual input, but based on your description, I assume you would like to use the 5 different words as one spatial dimension and the embedding size as the other.
If that’s the case, you could try the following:
I think I made it but I am not sure. Could you review it if possible. Let’s assume we have 100 unique words in our vocabulary, and the sentence I am interested in contains the first 20 of them. My embedding vectors are 300d and I will consider bi-grams and use 4 filters:
import torch
import torch.nn as nn
from random import randint
e = nn.Embedding(100,300)
ic = 1;oc =4;h=2;w=300
w = nn.Conv2d(ic,oc,(h,w))
embeds = e(torch.tensor([randint(0,99) for i in range(20)],dtype=torch.long))
embeds2 = embeds.view(1,1,20,300) # reshape it so that I can use it as input to convolution
conv_out = w(embeds2)
conv_out.shape
torch.Size([1, 4, 19, 1])
If I am correct until that point, I also wonder how could I perform the pooling operation. Basically I would like to take the maximum number in each of 1x19 feature vectors. Should I use pooling2d or pooling1d for that ? I couldn’t understand which one is the correct choice for me from the docs.