How to combine bidirectional LSTM sequentially

morgankohler · May 8, 2021, 3:17pm

The LSTM class includes a num_layers argument which stacks sequential LSTMs. I am wondering how to do this external of this argument. As in, I initialize several bidirectional LSTMs with num_layers=1 and want to put data through them sequentially.

Say the input to the LSTM is of shape [10, 16, 64] where 10 is the sequence length, 16 is the batch size, and 64 is the dimension of the input. When this is passed into a bidirectional LSTM, the output becomes [10, 16, 128] which can also be viewed as [10, 16, 2, 64] where the 2 comes from each bidirectional pass of the input.

Since the input to the LSTM is the same whether it is bidirectional or not, (seq_len, batch, input_size), how should I be passing the bidirectional output from the first LSTM to the second? As of now I am just passing the entire (10, 16, 128) output to the next LSTM and letting each directional workflow access the entire 128 vector from both directions of the previous LSTM. How does this work within the LSTM class itself?

ariG23498 · May 9, 2021, 7:01am

Hey @morgankohler
As far as I know, bi-directional LSTMs have different modes of concatenation for the hidden states. If you achieve 2*hidden_size that means you are stacking the vectors one on top of the other. It is how the other bi/uni directional LSTM stacked above will receive their inputs.

Having said that, I feel the way you have organised the code is good to go.