Example of Many-to-One LSTM

tom · April 7, 2017, 12:10pm

The output per time step has dimension hidden_size per direction (it’s the hidden layer). In LSTM the output is “modulated” cell state.

This is 1 for “usual” LSTM and 2 for bidirectional ones.

Commonly, you would then use hidden_size as the target size and use the last (time direction) output per batch item, i.e.

x = output[-1] # this is batch * hidden_size for unidirectional LSTM

You can then use x as input into whatever layer you want to have above the LSTM.

Best regards

Thomas