Passing a minibatchs of sequential data through a bidirectional rnn

Here’s what I think(!) is going on:

You want to train a network that takes a sequence of words as input to predict the next word (many-to-one). But what you’re actually training is a many-to-many network, more specifically a sequence tagging network like for POS or NER tagging.

Given your example, you’re data item should look more like:

input_sentence = [1, 4, 5, 7]
target_word = 9

Why should the network learn that 1 maps to 4, 4 maps to 5, and so on. That’s not the task, but it affects your loss and hence what your network learns.

In my opinion, that would also explain why the Bi-LSTM is so much better than the LSTM. Since the Bi-LSTM starts also from the end of the sentence – given a sentence of length N – it will learn that for the last N-1 steps, the last input will be the next target. The simple LSTM would have to look into the future for that.

So this is what I would try:

  • Change your dataset such that one data item is a sequence as input and a single word as output, and treat it like a classification (many-to-one) task
  • Take hidden and not out_packed_sequence as input for the fc1 layer! Since you only want the last state for many-to-one, the last output in case of Bi-LSTM are on “opposite ends” (see also my post). So it’s much simpler to use hidden. Here’s an example for an RNN-based classifier that might help. In your case, label_size will also be vocab_size.

I hope that helps.

1 Like