Bidirectional LSTM for classification

ironv · September 30, 2019, 12:03am

I am using a bidirectional LSTM for a binary classification model on text sequences.

self.rnn = nn.LSTM(embed_size, hidden_size, batch_first=True, bidirectional=True)
out,(ht,ct) = self.rnn(X_packed)
print(ht.shape)

for bs=64, hidden_size=128, the dimension of ht is 2 x 64 x 128. This is then pushed to a FC layer and finally passed through a sigmoid activation function.

Should the input to the FC layer be ht[-1] i.e. 64 x 128 or a concatenated version of the two torch.concat([ht[0],ht[-1]],dim=1) i.e. 64 x 256?

phan_phan · September 30, 2019, 4:45pm

Here ht[0] corresponds to the last output of the forward lstm, and ht[1] corresponds to the last output of the backward lstm.
You used a bidirectional lstm, so you might as well use the output of both directions! Your solution torch.concat([ht[0],ht[-1]],dim=1) seems correct.