About the variable length input in RNN scenario

I have some working code (runs and learns) that uses an nn.LSTM for text classification. I tried modifying my code to work with packed sequences, and while it runs, the loss no longer decreases (just stays the same). Only two modifications were made:

FIRST: I sort the data (B, T, D) and sequence lengths (both LongTensors) before passing them to Variable with the following function:

def sort_batch(data, seq_len):
    batch_size = data.size(0)
    sorted_seq_len, sorted_idx = seq_len.sort()
    reverse_idx = torch.linspace(batch_size-1,0,batch_size).long()
    sorted_seq_len = sorted_seq_len[reverse_idx]
    sorted_data = data[sorted_idx][reverse_idx]
    return sorted_data, sorted_seq_len

SECOND: I modified the forward function in the model code from the word_language_model pytorch example. For padded sequences I used:

def forward(self, input, hidden):
    emb = self.encoder(input)
    output, hidden = self.rnn(emb, hidden)
    # Take the output at the final time step
    decoded = self.decoder(output[:,-1,:].squeeze())
    return F.log_softmax(decoded), hidden

And for the variable length sequences I used:

def forward(self, input, seq_len, hidden):
    emb = self.encoder(input)
    emb = pack_padded_sequence(emb, list(seq_len.data), batch_first=True)
    output, hidden = self.rnn(emb, hidden)
    output, _ = pad_packed_sequence(output, batch_first=True)
    # Index of the last output for each sequence.
    idx = (seq_len-1).view(-1,1).expand(output.size(0), output.size(2)).unsqueeze(1)
    decoded = self.decoder(output.gather(1, idx).squeeze())
    return F.log_softmax(decoded), hidden

I believe each addition is implemented correctly, so I thought maybe there’s something more fundamental I’m missing about Variables or forward, or perhaps I’m not using pack_padded_sequence correctly. Thanks in advance, and great job to everyone who’s working hard on PyTorch. It’s really terrific.

2 Likes