RNN: pack_padded_sequence, do I need to unpermute indices after sorting sequences by length?

shenkev · December 12, 2017, 9:52pm

I’m using pack_padded_sequence() to preprocess my input x for an RNN model. This requires me to sort the sequences of x in a batch by their lengths and permute x accordingly. Does this mean that when I compute the loss using self.criterion(output, answer) that I need to permute the answers the same way? i.e. answer = answer[perm_idx]? Or should I reverse the x permutation (ist here an easier way to do this?)

Reversing index:

reverse_idx = to_var(torch.zeros(len(perm_idx)))
for i in range(len(perm_idx)):
    reverse_idx[i] = perm_idx[i]

output = output[reverse_idx]

Trying to compute loss:

output = self.net(x, x_len)
loss = self.criterion(output, answer)

# DEFINITION OF NET.forward()
def forward(self, x, x_len):
    # SORT YOUR TENSORS BY LENGTH!
    x_len, perm_idx = x_len.sort(0, descending=True)
    x = x[perm_idx]

# pack them up nicely
    packed_input = pack_padded_sequence(x, x_len.data.cpu().numpy(), batch_first=True)

    h0 = to_var(torch.randn(self.rnn_num_layers, x.size(0), self.rnn_hidden_size))

    hiddens, last_hidden = self.gru(packed_input, h0)
    preactivations = self.fc(last_hidden.squeeze())

    return self.softmax(preactivations)

SimonW · December 12, 2017, 9:58pm

Both ways are fine. Unsort can be done by argsort the first argsort.

shenkev · December 12, 2017, 10:00pm

Do you have a code snippet of what you mean by “argsort the first argsort”? This seems like the more elegant solution.

SimonW · December 12, 2017, 10:03pm

Not really. It’s twice as expensive, but here you go:

>>> x = torch.LongTensor(10).random_(100)
>>> x

 84
  3
 46
 82
 14
 61
 85
 27
 33
 59
[torch.LongTensor of size 10]

>>> sorted_x, argsort_x = x.sort()
>>> sorted_x

  3
 14
 27
 33
 46
 59
 61
 82
 84
 85
[torch.LongTensor of size 10]

>>> _, argargsort_x = argsort_x.sort()
>>> sorted_x[argargsort_x]

 84
  3
 46
 82
 14
 61
 85
 27
 33
 59
[torch.LongTensor of size 10]

>>> x

 84
  3
 46
 82
 14
 61
 85
 27
 33
 59
[torch.LongTensor of size 10]

>>> torch.equal(sorted_x[argargsort_x], x)
True

Adoni · January 10, 2018, 9:58pm

Hi,

I don’t know if this is the right way:

Suppose lengths is the length of all sentences.

ordered_len,ordered_idx=lengths.sort(0, descending=True)

Then after we get the result from RNN, we can unsort it by:

new_result[ordered_idx]=result

SimonW · January 10, 2018, 10:13pm

No. That is wrong. The ordered_idx is a map from sorted element index to original index. You need the reverse of that to achieve an unsort using the line you gave.

Adoni · January 10, 2018, 10:26pm

Hi Simon,
I think that’s equal to reverse .

I test this idea with the following code:

a=torch.randn(5)
sorted_a,idx=a.sort(0, descending=True)
b=torch.randn(5)
b[idx]=sorted_a
print('a:')
print(a)
print('sorted_a:')
print(sorted_a)
print(b)

The output are as following:

a:

-1.1561
 1.8450
 0.1463
 0.0551
 1.0577
[torch.FloatTensor of size 5]

sorted_a:

 1.8450
 1.0577
 0.1463
 0.0551
-1.1561
[torch.FloatTensor of size 5]


-1.1561
 1.8450
 0.1463
 0.0551
 1.0577
[torch.FloatTensor of size 5]

Note that b==a, which means that we unordered the sorted_a with sort index idx.

SimonW · January 10, 2018, 11:59pm

Oh right, my bad, I don’t know why I thought new_result was containing the sorted results.