I have some working code (runs and learns) that uses an `nn.LSTM`

for text classification. I tried modifying my code to work with packed sequences, and while it runs, the loss no longer decreases (just stays the same). Only two modifications were made:

**FIRST:** I sort the data `(B, T, D)`

and sequence lengths (both `LongTensors`

) before passing them to `Variable`

with the following function:

```
def sort_batch(data, seq_len):
batch_size = data.size(0)
sorted_seq_len, sorted_idx = seq_len.sort()
reverse_idx = torch.linspace(batch_size-1,0,batch_size).long()
sorted_seq_len = sorted_seq_len[reverse_idx]
sorted_data = data[sorted_idx][reverse_idx]
return sorted_data, sorted_seq_len
```

**SECOND:** I modified the `forward`

function in the model code from the word_language_model pytorch example. For padded sequences I used:

```
def forward(self, input, hidden):
emb = self.encoder(input)
output, hidden = self.rnn(emb, hidden)
# Take the output at the final time step
decoded = self.decoder(output[:,-1,:].squeeze())
return F.log_softmax(decoded), hidden
```

And for the variable length sequences I used:

```
def forward(self, input, seq_len, hidden):
emb = self.encoder(input)
emb = pack_padded_sequence(emb, list(seq_len.data), batch_first=True)
output, hidden = self.rnn(emb, hidden)
output, _ = pad_packed_sequence(output, batch_first=True)
# Index of the last output for each sequence.
idx = (seq_len-1).view(-1,1).expand(output.size(0), output.size(2)).unsqueeze(1)
decoded = self.decoder(output.gather(1, idx).squeeze())
return F.log_softmax(decoded), hidden
```

I believe each addition is implemented correctly, so I thought maybe thereās something more fundamental Iām missing about `Variables`

or `forward`

, or perhaps Iām not using `pack_padded_sequence`

correctly. Thanks in advance, and great job to everyone whoās working hard on PyTorch. Itās really terrific.