The PyTorch chatbot tutorial implements Luong attention and getting the encoder’s hidden state via:
# Set initial decoder hidden state to the encoder's final hidden state
decoder_hidden = encoder_hidden[:decoder.n_layers]
I believe it must be
decoder_hidden = encoder_hidden[-decoder.n_layers:] if it want to extract the last hidden state of last stacked LSTM. Am I missing something?
In this case, it would not make much difference because- the number of layers for both the encoder and decoder is the same. However, if the number of layers is unequal then the choice of which layers (the first
decoder.n_layers or the last
decoder.n_layers) to use is purely dictated by personal design choice.
Thanks again. By the way, if I desing bi-directional encoder and stacked RNN, the hidden state of the encoder ($h_n$) must be
[# layers * # dir, Batch, Hidden] and it doesn’t match to $H$'s dimension
[Seq_len, Batch, # dir * hid] for feed to initial decoder’s hidden state. How can I solve it? Just employ a FC layer?
The answer to this question depends on how many layers and directions your decoder has. If you did not come across this earlier, I highly recommend @bentrevett’s (a huge shoutout to him assuming he is the same as the owner of this repo) pytorch-seq2seq tutorials where he walks you through all the shape transformations inside the encoder and decoder.
I actually saw that and couldn’t find anything that helping me solve my problem. The link only implements 1 layer LSTM.