Strictly speaking you don’t have LSTM cells but LSTM layers – at least I think your code was using nn.LSTM
and not nn.LSTMCell
.
As far as I understand an LSTM with 2 layers is the same as having 2 LSTM layers and using the output of the first as the input of the second. I assume there are some difference in the detail, like the initialization of the hidden state. But over all it should be the same. In short, I would use only one LSTM, just using 2 or more layers.
Using view()
or reshape()
is not intrinsically wrong, one just have to be careful to do it right. For example, if you look at the _flatten()
method of the decoder:
return h.transpose(0,1).contiguous().view(batch_size, -1) # Correct
Here I use view()
to flatten the 3-dim tensor. But I can only do this because I got the tensor first in the right “shape”. If I would have used, say
return h.contiguous().view(batch_size, -1) # Wrong
I wouldn’t get an error since the shape is still correct. However, the data is now kind of scrambled; see my example in that old post.
Technically, you only need to flatten the tensor (e.g., the last hidden state of the LSTM) if you intend to push it through some additional linear layers before giving it to the decoder. Without that – and assuming your encoder and decoder LSTM have the same setup (same number of layers, same number of hidden dimension, both uni-/bi-directional) – then you can simply set the initial hidden state of the decoder as the last hidden state of the decoder.