Replacing GRU with LSTM in an Encoder-Decoder architecture results in a dimensions mismatch - why?

I think the Seq2Seq tutorial might be a good starter (as well as generally checking the docs to see what shapes are expected).
Besides that @vdw has shared some excellent posts e.g. here and here, which clarify the structure a bit more.

2 Likes