I am new to Pytorch and trying to implement a lstm character level seq2seq model. What I am trying to do is: Each sequence is a list of the characters of a particular word and several words will create a minibatch, which also represent a sentence. Now, in my understanding for each sequence (list of character embedding, in my case), there will be a final hidden state. So, if there is two character sequence (two words), there will be two hidden state each representing a word. I am not even considering the variable-length sequence. I also don’t understand why it should be a problem if the sequence length is variable. Should not the lstm loop until there are elements in each particular sequence? The number of iteration should not be static, right? Here is my code I tried:
character_embedding = nn.Embedding(17,5)
#LSTM with input embedding dimention 5, and expected hidden state dimention 3
lstm = nn.LSTM(5,3)
#each vector is a word and there are two words with same number of character
char_embeds=character_embedding(torch.tensor([[1,2,3,4,5],[4,5,6,7,8]]))
#out will contain all the hidden states for each character and hidden sould contain final hidden state for each sequence
out, hidden=lstm(char_embeds)
print("char_embeds: ")
print(char_embeds)
print("hidden: ")
print(hidden[0])
Output:
char_embeds:
tensor([[[ 1.0157, -0.2197, 1.6615, -1.2916, -0.6116],
[ 0.5630, -0.9618, 0.7287, -0.5727, 1.6796],
[ 0.9902, -0.5408, 0.9785, -1.1090, 1.1126],
[ 0.7472, 0.0440, 1.0629, -0.7375, 0.0828],
[ 0.6632, -0.4523, 0.5051, 2.6031, 0.3798]],
[[ 0.7472, 0.0440, 1.0629, -0.7375, 0.0828],
[ 0.6632, -0.4523, 0.5051, 2.6031, 0.3798],
[-0.6522, -3.2626, 0.7967, -1.0322, 0.4667],
[-0.5086, 0.5142, -0.7141, -1.5352, 0.4177],
[-0.0582, 1.3398, -0.2829, 0.1392, 1.0709]]],
grad_fn=<EmbeddingBackward>)
hidden:
tensor([[[-0.2774, 0.0724, -0.4297],
[-0.4580, 0.1563, -0.5811],
[-0.5492, -0.2314, 0.3473],
[-0.0772, 0.2474, -0.1026],
[-0.1042, 0.4394, -0.3582]]], grad_fn=<StackBackward>)
Here, I would expect two hidden states, as there are two sequences. But I am getting 5 hidden states. What is that? What I am missing?
The second question is, Why can’t LSTM can not handle variable-length sequences?