[seq2seq] How does decoder's initial hidden state work?

Hello, I am implementing a seq2seq model and found a problem here:

  1. encoder is a 2-layer bidirectional rnn, thus, the first dimension of last hidden state is 2*2=4
  2. decoder is a 2-layer uni-directional rnn, I am wondering how does the encoder’s last hidden state pass to the decoder’s initial hidden state?
  3. and, what if my decoder is a 1-layer (different from the num_layers of encoder), how does 2. work?

Did you solve this? I have the same question.

As for different number of layers I guess you could just initialize the layer in the decoder. But I haven’t come across different number of layers in encoder and decoder in any of the implementations. Why do you need this?

Edit : I’ve found two approaches to address this:

  1. Concatenate the hidden states of the forward and backward passes.

  2. Use the hidden states from either the forward or the backward pass.

I don’t know the effects either of the two options will have on the model performance. You could try out each and see what works best for your model.