How muti-layers bi LSTM working?(solved)

huanghuang · December 11, 2017, 1:57am

Hi, guys, I have a question about using LSTM. The data set includes many word sequences, each word sequence is a sentence. There are two way using LSTM. One is that: input a 3D tensor with the size:sentence_length * 1 * word_embedding_length; Another is that: using a for loop, in the loop, each time input a 3D tensor with the size: 1 * 1 * word_embedding_length. Could someone tell me the difference between these two ways? Are these ways equivalent?

huanghuang · December 11, 2017, 2:51am

It been confirmed in experiments that these two way are equivalent. The second way can get each time hidden state .

huanghuang · December 11, 2017, 7:57am

lstm = nn.LSTM(4, 3, 2, bidirectional=True)
inputs = [autograd.Variable(torch.randn((1,4)))
for _ in range(4)]

hidden = (autograd.Variable(torch.randn((4, 1, 3))),
autograd.Variable(torch.randn((4, 1, 3))))
hidden_ = hidden
outs = []
hiddens = []
for i in inputs:
out, hidden = lstm(i.view(1, 1, -1), hidden)
outs.append(out)
hiddens.append(hidden)
print(out)
#print(’******************’)

print(hidden)

print(’**********************************************’)

inputs_ = torch.cat(inputs).view(-1, 1, 4)
out_, hidden_ = lstm(inputs_, hidden_)
print(out_)

I have used the codes above to understanding LSTM, and give the conclution: if is undirectional LSTM, the two ways are equivalent, if is bi-directional LSTM, the outputs are different. I thinks taking the sequnces as input is the right way if bi directional LSTM, could some one can tell me the truth?