Why embedding or rnn/lstm can not handle variable length sequence?

Pytorch embedding or lstm (I don’t know about other dnn libraries) can not handle variable-length sequence by default. I am seeing various hacks to handle variable length. But my question is, why this is the case? I mean, sequences almost never the same size/length and rnn/lstm should loop through until the end of a sequence. If so, why it will be sensitive to the various length of the sequences of a minibatch? Pytorch Embedding is a look-up table. To me, I don’t see any reason to be sensitive to variable length. Should not be the ideal case is, I can give a minibatch of sentences with a variable number of words? Like the following:

word_embedding = nn.Embedding(17,5)
#each vector is a sentence
word_embeds=word_embedding(torch.tensor([[1,2,3,4,5],[4,5,6,7]]))

Your example won’t work, as you cannot create a tensor using your inputs of different lengths.
Internally each tensor holds the data as a blob storing internal attributes such as the stride and shape of the tensor. This makes is possible e.g. to apply batched operations.

As you can see in the docs of nn.Embedding, this layer will take LongTensors of arbitrary shape as its input.

1 Like