Keras’s LSTM layer includes a single flag to flatten the output into 1xN-hidden dimensions. https://keras.io/layers/recurrent/
" * return_sequences : Boolean. Whether to return the last output in the output sequence, or the full sequence."
This allows you to process a sequence, convert it to a single embedding, and then pass that to something like a classifier. I don’t want to go token by token. I want to take a sequence of arbitrary length and flatten it to a fixed embedding.
I know there are guides and explanations for how to do this in Pytorch but there are many mistakes and contradictions across different sources. There is a lot of quick speculation, and I’d like a definitive answer if it exists in documentation or example code.
The best diagram I’ve found is here
Even for people writing tutorials there is confusion about what to do
rnn = nn.LSTM(features_in=10, features_out=20, num_layers=1, batch_first=True)
is similar to lstm = tf.keras.layers.LSTM(features_out=20)
Note: keras does not provide option for how many LSTM layers you want stack therefore I put 1 for num_layers. In keras you don’t have to provide features_in.