Result unsort problem in Bidirectional LSTM with DataParallel

It is required to sorting words accordding their length to perform bidirectional LSTM on words with pack_padded_sequence, and I need to unsort it’s output to get results with original order, my code is as following, but I always getting exception as:

line 51, in sent_forward hn[indices] = hn # unsort hn
RuntimeError: index 72 is out of bounds for dimension 0 with size 21

Those code can run normally on CPU or single GPU, is there anyway to solve it?

def __init__(self, ...):
   ...
   self.lstm = nn.LSTM(
        input_size=embedding_size,
        hidden_size=hidden_size,
        bidirectional=bidirectional,
        num_layers=lstm_layers,
        batch_first=True,
    )
...

def forward(self, words, lengths, indices):
    sent_len = words.shape[0]
    # shape of sent_len: (sent_len, max_word_len)

    embedded = self.embedding(words)
    # shape of embedded: (sent_len, max_word_len, embedding_dim)

    packed = nn.utils.rnn.pack_padded_sequence(embedded, lengths, batch_first=True)
    self.lstm.flatten_parameters()
    _, (hn, _) = self.lstm(packed)
    # shape of hn:  (n_layers * n_directions, sent_len, hidden_size)

    hn = hn.permute(1, 0, 2).contiguous().view(sent_len, -1)
    # shape of hn:  (sent_len, n_layers * n_directions * hidden_size) = (sent_len, 2*hidden_size)

    # shape of indices: (sent_len, max_word_len)
    hn[indices] = hn  # unsort hn
    # unsorted = hn.new_empty(hn.size())
    # unsorted.scatter_(dim=0, index=indices.unsqueeze(-1).expand_as(hn), src=hn)
    return hn