Hi,
If I want to embed a batch of 2 samples of 4 indices each, I know I can do it as follows:
import torch
import torch.nn as nn
embedding = nn.Embedding(10, 3)
input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
embedding(input)
However, I don’t know how can I embed a batch of 2 samples with different lengths like below:
import torch
import torch.nn as nn
embedding = nn.Embedding(10, 3)
input2 = torch.LongTensor([[1,2,4,5],[4,3,2]])
# Now I got error here saying:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: expected sequence of length 4 at dim 1 (got 3)
I saw that I can provide a padding id and use it to balance the sequence lengths:
embedding = nn.Embedding(10, 3, padding_idx=0)
input = torch.LongTensor([1,2,4,5],[4,3,2,0])
embedding(input)
I have 3 questions regarding this usage:
- Does this padding solve my problem ? I mean, can I use it as shown above to balance the sequence lenghts.
- Does it differ to put the padding token at the beginning of sentence or at the end of sentence ?
- Does this padding token effect the calculations during back-propagation ?