How to make an LSTM Bidirectional?

vdw · February 4, 2022, 12:56am

Instead of lstm_out, use the last hidden state. In other words, instead of

lstm_out, _ = self.lstm(embeds)

do

lstm_out, hidden = self.lstm(embeds)

And use hidden as it contains the last hidden state with respect to both directions. It’s much more convenient to use. If you use lstm_out, the last hidden state of the forward direction is at index -1, and the last hidden state of the backward direction is at index 0 (w.r.t. to the correct dimension of the tensor).

Note that you still have to use view() or something on hidden to get the correct hidden state (e.g., in case you have multiple layers). You can have a look at my code here, the important snippet is – it’s a bit verbose since I support both GRU/LSTM and uni/bidirectional:

# Extract last hidden state
if self.params.rnn_type == RnnType.GRU:
    final_state = self.hidden.view(self.params.num_layers, self.num_directions, batch_size, self.params.rnn_hidden_dim)[-1]
elif self.params.rnn_type == RnnType.LSTM:
    final_state = self.hidden[0].view(self.params.num_layers, self.num_directions, batch_size, self.params.rnn_hidden_dim)[-1]
# Handle directions
final_hidden_state = None
if self.num_directions == 1:
    final_hidden_state = final_state.squeeze(0)
elif self.num_directions == 2:
    h_1, h_2 = final_state[0], final_state[1]
    # final_hidden_state = h_1 + h_2               # Add both states (requires changes to the input size of first linear layer + attention layer)
    final_hidden_state = torch.cat((h_1, h_2), 1)  # Concatenate both states