[seq2seq] How does decoder's initial hidden state work?

howard.lo · January 26, 2018, 6:55am

Hello, I am implementing a seq2seq model and found a problem here:

encoder is a 2-layer bidirectional rnn, thus, the first dimension of last hidden state is 2*2=4
decoder is a 2-layer uni-directional rnn, I am wondering how does the encoder’s last hidden state pass to the decoder’s initial hidden state?
and, what if my decoder is a 1-layer (different from the num_layers of encoder), how does 2. work?

Ujan_Deb · March 15, 2018, 11:06am

Did you solve this? I have the same question.

As for different number of layers I guess you could just initialize the layer in the decoder. But I haven’t come across different number of layers in encoder and decoder in any of the implementations. Why do you need this?

Edit : I’ve found two approaches to address this:

Concatenate the hidden states of the forward and backward passes.
Use the hidden states from either the forward or the backward pass.

I don’t know the effects either of the two options will have on the model performance. You could try out each and see what works best for your model.