I have some working code (runs and learns) that uses an nn.LSTM
for text classification. I tried modifying my code to work with packed sequences, and while it runs, the loss no longer decreases (just stays the same). Only two modifications were made:
FIRST: I sort the data (B, T, D)
and sequence lengths (both LongTensors
) before passing them to Variable
with the following function:
def sort_batch(data, seq_len):
batch_size = data.size(0)
sorted_seq_len, sorted_idx = seq_len.sort()
reverse_idx = torch.linspace(batch_size-1,0,batch_size).long()
sorted_seq_len = sorted_seq_len[reverse_idx]
sorted_data = data[sorted_idx][reverse_idx]
return sorted_data, sorted_seq_len
SECOND: I modified the forward
function in the model code from the word_language_model pytorch example. For padded sequences I used:
def forward(self, input, hidden):
emb = self.encoder(input)
output, hidden = self.rnn(emb, hidden)
# Take the output at the final time step
decoded = self.decoder(output[:,-1,:].squeeze())
return F.log_softmax(decoded), hidden
And for the variable length sequences I used:
def forward(self, input, seq_len, hidden):
emb = self.encoder(input)
emb = pack_padded_sequence(emb, list(seq_len.data), batch_first=True)
output, hidden = self.rnn(emb, hidden)
output, _ = pad_packed_sequence(output, batch_first=True)
# Index of the last output for each sequence.
idx = (seq_len-1).view(-1,1).expand(output.size(0), output.size(2)).unsqueeze(1)
decoded = self.decoder(output.gather(1, idx).squeeze())
return F.log_softmax(decoded), hidden
I believe each addition is implemented correctly, so I thought maybe there’s something more fundamental I’m missing about Variables
or forward
, or perhaps I’m not using pack_padded_sequence
correctly. Thanks in advance, and great job to everyone who’s working hard on PyTorch. It’s really terrific.