Hi everyone,
I am trying to process a sentence word by word with a BiLSTM. My aim is to concatenate the forward and backward LSTM’s hidden states just after each token processed.
For example, Let’s assume our sentence is: TOK1, TOK2, TOK3. Forward LSTM process this sequence by TOK1,TOK2,TOK3 order and produces hf1,hf2,hf3 hidden states. Backward LSTM process this sequence by TOK3,TOK2,TOK1 order and produces hb1,hb2,hb3 hidden states. (hb1 is the hidden stated created just ofter TOK3 is processed).
So, I want to combine:
hb1 with hf3
,
hb2 with hf2
,
hb3 with hf1
To do that, I did the following:
hidden_dim = 200
embedding_dim = 10
vocab_size = 6
ids = torch.tensor([1,5,4])
bs = 1
embedding = nn.Embedding(vocab_size,embedding_dim)
lstm = nn.LSTM(embedding_dim, hidden_dim,num_layers=1, bidirectional=True)
embeds = embedding(ids)
output,_ = lstm(embeds)
output = output.view(3, bs, 2, hidden_dim) # 3= sequence_length, 2 = direction number
What I don’t know is that in which order the hidden states of backward lstm stored in this output
variable.
For example, output[2,bs,1,:]
is the hidden state hb1
or hb3
according to my annotation above ?
EDIT
Before reshaping output
variable, it was a vector with (3,bs,2*hidden_dim)
dimensions. In this version, are the hidden states stored as I asked or something differently ? I could have answered these questions if I was able to set the forward and backward weights of BiLSTM same.