I’m making my own version of the tutorial here on classifying surnames by their language of origin. I’d like to use an LSTM and train the model on batches of variable length sequences. In other words, I’m to solve a many-to-one (many time steps to one label) classification problem with an LSTM and variable length inputs.
The part I’m struggling with is properly designing the forward pass on my model to use packed sequences. If I were writing this model to handle batches of sequences that all had the same number of time steps, I might write something like this:
def forward(self, inp, hidden):
out, hidden = self.lstm(inp, hidden)
last_lstm_step = out[-1] # Since we only produce one label
decoded = self.linear_decoder(last_lstm_step)
return decoded, hidden
But since this model operates with PackedSequences which have a variable number of time-steps, we can’t just use out[-1] to get the last time step for each input sequence. Instead, we may try something like this:
def forward(self, inp, hidden):
out, hidden = self.lstm(inp, hidden)
(all_timesteps, lengths) = pad_packed_sequence(out)
last_step = last_steps(out, lengths)
decoded = self.linear_decoder(last_step)
return decoded, hidden
Where last_steps is something like this:
def last_steps(x, lengths):
lasts = []
for i, j in zip(range(x.size()[1]), lengths):
lasts.append(x[j - 1][i].view(1, -1))
return torch.cat(lasts, 0)
Unfortunately, this forward pass seems not to work with batches larger than one or two. With larger batch sizes the network fails to learn and often falls into guessing the same label for every sample. I suspect I’m doing something wrong in the “unpacking and getting last steps” phase of the forward pass, but I’m not sure what. Any help much appreciated.