How to use the outputs of a deep lstm layer?

Hi all,

I am trying to build a seq2seq attention-based model. In the literature, people often use deep lstm layers and each layer in this case produces a hidden state which is passed to the next one. If I use N layers, at the end I have N hidden states. What is the common approach to work with these ?

One can add all N and then create one hidden state, or I can concatenate them or I can just consider the outputs from the very last layer. I know that all of them are possibly useful approaches but what is the most common way of handling this ?

I don’t know what’s the most used alternative in the literature but as you said they are all viable option. The best thing you could do would be to test all of them and see which gives you the best results. My money is on the concatenated hidden layers approach.

1 Like