Hi,
I have a question about padding and the effects of this on a CNN text classification model. Let’s say I have a sentence with 4 words but because I want all my tensors in a batch to be the same size I use padding. So, for the example I might have a padded and unpadded sequence looking like: [1,2,3,4] and [1,2,3,4,0,0,0]; each int here represents some token, let’s say words, and I’ve padded by 3 zeros to have a size of 7 across all tensors in the batch. Suppose the vocabulary is of size 100 and each word vector gets dimension 5. Suppose I have 1 filter with a kernel of size 2 and then I want to apply max pooling across all the (sentence length) dimensions. It seems that if I don’t remove pooling, I’ll have some extra junk at the end, and this might screw up max pooling or mean pooling. Should you remove the pooling indices in such a model? Basically m != m_padded generally below. Is this something to be concerned about? I know that for RNN’s there’s pad sequence logic and utils, so my questions is similar but for CNN. It seems like technically speaking I am introducing padding without wanting to …
Thank you!
e = nn.Embedding(100, 5, padding_idx=0)
# 4 X 5 matrix
x = e(torch.tensor([1,2,3,4]))
# 7 X 5 matrix
x_padded = e(torch.tensor([1,2,3,4,0,0,0]))
# The filter.
f = nn.Conv1d(5, 1, 2)
# 1 X 3 matrix
z = f(x.t())
# 1 X 6
z_padded = f(x_padded.t())
# 1 X 1
m = nn.AvgPool1d(3)(z)
# 1 X 1
m_padded = nn.AvgPool1d(6)(z_padded)