Is PyTorch tutorial wrong?

I_H_Yoo · June 23, 2020, 6:46pm

The PyTorch chatbot tutorial implements Luong attention and getting the encoder’s hidden state via:

# Set initial decoder hidden state to the encoder's final hidden state
decoder_hidden = encoder_hidden[:decoder.n_layers]

I believe it must be decoder_hidden = encoder_hidden[-decoder.n_layers:] if it want to extract the last hidden state of last stacked LSTM. Am I missing something?

harsha_g · June 23, 2020, 9:06pm

In this case, it would not make much difference because- the number of layers for both the encoder and decoder is the same. However, if the number of layers is unequal then the choice of which layers (the first decoder.n_layers or the last decoder.n_layers) to use is purely dictated by personal design choice.

I_H_Yoo · June 24, 2020, 3:51pm

Thanks again. By the way, if I desing bi-directional encoder and stacked RNN, the hidden state of the encoder ($h_n$) must be [# layers * # dir, Batch, Hidden] and it doesn’t match to $H$'s dimension [Seq_len, Batch, # dir * hid] for feed to initial decoder’s hidden state. How can I solve it? Just employ a FC layer?

harsha_g · June 24, 2020, 4:25pm

The answer to this question depends on how many layers and directions your decoder has. If you did not come across this earlier, I highly recommend @bentrevett’s (a huge shoutout to him assuming he is the same as the owner of this repo) pytorch-seq2seq tutorials where he walks you through all the shape transformations inside the encoder and decoder.

I_H_Yoo · June 24, 2020, 5:54pm

I actually saw that and couldn’t find anything that helping me solve my problem. The link only implements 1 layer LSTM.