Poor Training on PackedSequence batches with LSTM

I’m making my own version of the tutorial here on classifying surnames by their language of origin. I’d like to use an LSTM and train the model on batches of variable length sequences. In other words, I’m to solve a many-to-one (many time steps to one label) classification problem with an LSTM and variable length inputs.

The part I’m struggling with is properly designing the forward pass on my model to use packed sequences. If I were writing this model to handle batches of sequences that all had the same number of time steps, I might write something like this:

def forward(self, inp, hidden):
    out, hidden = self.lstm(inp, hidden)
    last_lstm_step = out[-1] # Since we only produce one label
    decoded = self.linear_decoder(last_lstm_step)
    return decoded, hidden

But since this model operates with PackedSequences which have a variable number of time-steps, we can’t just use out[-1] to get the last time step for each input sequence. Instead, we may try something like this:

def forward(self, inp, hidden):
    out, hidden = self.lstm(inp, hidden)
    (all_timesteps, lengths) = pad_packed_sequence(out)
    last_step = last_steps(out, lengths)
    decoded = self.linear_decoder(last_step)
    return decoded, hidden

Where last_steps is something like this:

def last_steps(x, lengths):
    lasts = []
    for i, j in zip(range(x.size()[1]), lengths):
        lasts.append(x[j - 1][i].view(1, -1))
    return torch.cat(lasts, 0)

Unfortunately, this forward pass seems not to work with batches larger than one or two. With larger batch sizes the network fails to learn and often falls into guessing the same label for every sample. I suspect I’m doing something wrong in the “unpacking and getting last steps” phase of the forward pass, but I’m not sure what. Any help much appreciated. :+1:

2 Likes

An example using batches of variable length sentences could also be a great extension to this tutorial. I haven’t found anything about PackedSequence in the tutorials so far.

This reply by @Tudor_Berariu on the topic of Many-to-One LSTMs relates closely to what I’m trying to do. But how do I “Just take the last element from that output sequence.” when my input batch is a PackedSequence of samples of variable length (in terms of time-steps)?

output, (hn, cn) = LSTM(packed_input)

hn containing the hidden state for t=seq_len, which means it should be the last element from the output sequence. Even your LSTM takes variable-length of input, it should be something like this:

a b c d 
e f g 0
h i j 0
k l 0 0
# hn:
d
g
j
l
1 Like