Hello @osm3000,
The output per time step has dimension hidden_size per direction (it’s the hidden layer). In LSTM the output is “modulated” cell state.
This is 1 for “usual” LSTM and 2 for bidirectional ones.
Commonly, you would then use hidden_size as the target size and use the last (time direction) output per batch item, i.e.
x = output[-1] # this is batch * hidden_size for unidirectional LSTM
You can then use x as input into whatever layer you want to have above the LSTM.
Best regards
Thomas