How to use stacked blstm with Luong attention

I try to implement a seq2seq with attention model. I want to use nn.LSTM with bidirectional=True and n_layers >1.

I am confused about n_layers. Here is the documentation:
* **num_layers** – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1

Because of the input of the second layer is the output of the first layer, I expect that the hidden_state shape should corporate n_layers.

I understand that for the bidirectional=True, I can add both direction for the hidden state, How can we do if we use n_layers>1. Most of the code online use only 1 layer for encoder and decoder!

You can have a look at my implementation of mine; it’s a RNN-based classifier with attention, i.e., only the encoder part of a Seq2Seq model. But that’s all you need. The code is a bit verbose as it classifier is configurable using GRU or LSTM, uni or bidirectional, different number of RNN layers, different numbers of linear layers at the end. In a nutshell:

  • Yes, you can add the 2 directions. In my implementation, I concatenate them: notice the self.params.rnn_hidden_dim * self.num_directions when creating the Attention layer. You would need to this if you add the 2 directions, of course. Don’t ask me which solution is better/preferred :).

  • Regarding the number of RNN layers, just attend over the last layer. In this way, it doesn’t matter at all how many layers do you have. Check out the forward() method, rnn_output is all you need as it gives you all states of the sequence with respect to last RNN layer.

I hope that helps a bit.