Hello, I am implementing a seq2seq model and found a problem here:
- encoder is a 2-layer bidirectional rnn, thus, the first dimension of last hidden state is 2*2=4
- decoder is a 2-layer uni-directional rnn, I am wondering how does the encoder’s last hidden state pass to the decoder’s initial hidden state?
- and, what if my decoder is a 1-layer (different from the num_layers of encoder), how does 2. work?
Did you solve this? I have the same question.
As for different number of layers I guess you could just initialize the layer in the decoder. But I haven’t come across different number of layers in encoder and decoder in any of the implementations. Why do you need this?
Edit : I’ve found two approaches to address this:
Concatenate the hidden states of the forward and backward passes.
Use the hidden states from either the forward or the backward pass.
I don’t know the effects either of the two options will have on the model performance. You could try out each and see what works best for your model.