How muti-layers bi LSTM working?(solved)

Hi, guys, I have a question about using LSTM. The data set includes many word sequences, each word sequence is a sentence. There are two way using LSTM. One is that: input a 3D tensor with the size:sentence_length * 1 * word_embedding_length; Another is that: using a for loop, in the loop, each time input a 3D tensor with the size: 1 * 1 * word_embedding_length. Could someone tell me the difference between these two ways? Are these ways equivalent?

It been confirmed in experiments that these two way are equivalent. The second way can get each time hidden state .

lstm = nn.LSTM(4, 3, 2, bidirectional=True)
inputs = [autograd.Variable(torch.randn((1,4)))
for _ in range(4)]

hidden = (autograd.Variable(torch.randn((4, 1, 3))),
autograd.Variable(torch.randn((4, 1, 3))))
hidden_ = hidden
outs = []
hiddens = []
for i in inputs:
out, hidden = lstm(i.view(1, 1, -1), hidden)
outs.append(out)
hiddens.append(hidden)
print(out)
#print(’******************’)

print(hidden)

print(’**********************************************’)

inputs_ = torch.cat(inputs).view(-1, 1, 4)
out_, hidden_ = lstm(inputs_, hidden_)
print(out_)

I have used the codes above to understanding LSTM, and give the conclution: if is undirectional LSTM, the two ways are equivalent, if is bi-directional LSTM, the outputs are different. I thinks taking the sequnces as input is the right way if bi directional LSTM, could some one can tell me the truth?