I try to implement a seq2seq with attention model. I want to use nn.LSTM with bidirectional=True and n_layers >1.
I am confused about n_layers. Here is the documentation:
* **num_layers** – Number of recurrent layers. E.g., setting num_layers=2
would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
Because of the input of the second layer is the output of the first layer, I expect that the hidden_state shape should corporate n_layers.
I understand that for the bidirectional=True, I can add both direction for the hidden state, How can we do if we use n_layers>1. Most of the code online use only 1 layer for encoder and decoder!
You can have a look at my implementation of mine; it’s a RNN-based classifier with attention, i.e., only the encoder part of a Seq2Seq model. But that’s all you need. The code is a bit verbose as it classifier is configurable using GRU or LSTM, uni or bidirectional, different number of RNN layers, different numbers of linear layers at the end. In a nutshell:
Yes, you can add the 2 directions. In my implementation, I concatenate them: notice the
self.params.rnn_hidden_dim * self.num_directions when creating the Attention layer. You would need to this if you add the 2 directions, of course. Don’t ask me which solution is better/preferred :).
Regarding the number of RNN layers, just attend over the last layer. In this way, it doesn’t matter at all how many layers do you have. Check out the
rnn_output is all you need as it gives you all states of the sequence with respect to last RNN layer.
I hope that helps a bit.