Pytorch equivalent to keras.layers.LSTM(return_sequences=False)

Keras’s LSTM layer includes a single flag to flatten the output into 1xN-hidden dimensions.
" * return_sequences : Boolean. Whether to return the last output in the output sequence, or the full sequence."

This allows you to process a sequence, convert it to a single embedding, and then pass that to something like a classifier. I don’t want to go token by token. I want to take a sequence of arbitrary length and flatten it to a fixed embedding.

I know there are guides and explanations for how to do this in Pytorch but there are many mistakes and contradictions across different sources. There is a lot of quick speculation, and I’d like a definitive answer if it exists in documentation or example code.

The best diagram I’ve found is here

Even for people writing tutorials there is confusion about what to do

1 Like

Can you please explain how to do this in pytorch

Hi ,

Did you find a solution? Thanks!

The output of an nn.LSTM is output, (h_n, c_n) with the following shapes

  • output.shape: (seq_len, batch, num_directions * hidden_size)
  • h_n.shape: (num_layers * num_directions, batch, hidden_size)
  • c_n.shape: (num_layers * num_directions, batch, hidden_size)

That means that output contains the full sequence as indicated by seq_len.

1 Like

Thanks @vdw! Did you mean that it is enough to take the last part of the sequence to match Keras output? Did I understood correctly?

I’m not that familiar with Keras. Maybe this post will help?

Okay. I found the answer.

rnn = nn.LSTM(features_in=10, features_out=20, num_layers=1, batch_first=True)
is similar to lstm = tf.keras.layers.LSTM(features_out=20)

Note: keras does not provide option for how many LSTM layers you want stack therefore I put 1 for num_layers. In keras you don’t have to provide features_in.

rnn = nn.LSTM(10, 20, 1, batch_first=True)

#  [batch, sequence length, features]
input = torch.randn(5, 3, 10)

h0 = torch.randn(1, 5, 20)
c0 = torch.randn(1, 5, 20)
output, (hn, cn) = rnn(input, (h0, c0))

# output shape: [batch, sequence length, out features]

torch.Size([5, 3, 20])

Now, for keras:

# [batch, timesteps, feature]
inputs = tf.random.normal([5, 3, 10])
lstm = tf.keras.layers.LSTM(20, return_sequences=True)
output = lstm(inputs)

# output shape: [batch, sequence length, out features]

TensorShape([5, 3, 20])