[Solved] Multiple PackedSequence input ordering

@aron-bordin Hey, wouldn’t it be easier if you change this from:

to

output, (ht, ct) = self.rnn(packed, prev_hidden)
decoded = ht[-1]

Both the decoded above should be the same.

Isn’t the output of self.rnn already gave you the output for t=seq_len, even though we have variable-length input?

Also, may I ask what’s your prev_hidden? I thought it should be the initial state for hidden and cell.

1 Like

Both the decoded above should be the same.

It does look like the two are equivalent. Could anyone please confirm if using ht[-1] is appropriate?

No, they are not the same.
The problem here is that I’m passing a batch of variable sequence lengths. So. some of the sequences will be padded with 0’s vectors, leading to an unwanted output in the rnn.
So, the rnn will produce one output per timestep, and as long as some of the timesteps are zero (due to the padding), it’s necessary to get the proper output.

1 Like

My input is also a batch of variable-length input. I also padded them and pack them using pack_padded_sequence.

using ht[-1] return the same result as your solution.

I think you are expecting ht[-1] will return zeros for short inputs that were padded, right? But, that’s not the case when I tested it. Would you mind to double-check that? Or am I missing something here?

2 Likes

I’ll check this on Monday and I post here. But some points to consider is
that I’m using both LSTM and GRU rnn, and with the bidirectional
architecture. And I remember that I first tested with -1, but this was not
working in my case, and the I used the solution above. I’ll confirm it in
the beginning of the week :wink:

My prev_hidden is usually zero while training, I just have the parameter to
evaluate the model in some specific starting points.

Does anyone have sample code for batched multiple packed sequences with Attention?

Here’s where I am confused:

Background I have a data set with a series of sentence tuples (s1, s2). I have a Bidi lstm for each of them and I would like to attend on the two sequences s1, and s2 before I send them off to a linear layer – I would also like to do this in a batch setting though the pseudo code below is written in a per instance (i.e. per s1,s2 tuple) forward pass.

Something like this (not working code – pseudo code)

    combined_input_2_attention=torch.cat((s1_lstm_out, s2_lstm_out), 0)
	attention_alphas=self.softmax(self.attention_layer(combined_input_2_attention))
	attn_applied = torch.bmm(attention_alphas.unsqueeze(0),
                             combined_input_2_attention)
	
	output_embedding=self.hidden_layer(attn_applied)
	output=self.softmax(output_embedding)

where s1_lstm_out and s2_lstm_out are the outputs of sending one s1, s2 tuple in the forward pass.

Q1. If this was batched – Do the attention weights (alphas) need to be in dimensions of length of the max sequence length (s1+s2 since I am concatenating) per batch(?) or globally?. I doubt this is something can be per batch – because how would i initialize the dimensions of the linear layer that does the attention?

Q2 Either way I need to have a packed sequence for the two LSTMs before I attend over them or concatenate them – but the problem is padding requires them to be sorted individually.
I read the answer of using sort but I could not follow it completely.

Here’s what I understood from the post above:

  1. I take a set of tuples (s1, s2). Sort them individually using tensor.sort and keep track of their original indices.
  2. Individually pack s1 and s2 into padded sequences based on their individual max lengths per batch and send it to my forward pass
  3. At the end of it – I return my outputs by reverse mapping them based on the sorted order generated in (1)? If yes, wouldn’t the network have seen an instance of s1 and s2 in different orders from my original – why is this correct/why does this work? What am I missing?

Any leads /clarification would be extremely helpful! Thanks!

Figured this out –

sequence 1 - sort -> pad and pack ->process using RNN -> unpack ->unsort
sequence 2 - sort -> pad and pack ->process using RNN -> unpack ->unsort

Do whatever you wanted to do with the unsorted outputs (combine, attend-- whatever)

2 Likes

I think you are better off just by keeping it sorted throughout the training, as sorting operations are usually faster on the CPU (therefore it is better to do the sorting at the data loading stage).

2 Likes

@miguelvr Thanks - the reason I am unsorting is that both my sequences are “paired” but pass through different RNNs. If I dont unsort, I lose the pairing (because packing was done independently).

1 Like

@aron-bordin : I have a couple of questions. To confirm, dict_index is sorted and original_index is the inverse of the sorting permutation? Is that correct and how do you get the inverse of the sorting permutation from torch.sort?

the second output of sort is the indices i the original ordering.

You can use these indices and a scatter_ operation to unsort to the original permutation.

x = torch.randn(10)
y, ind = torch.sort(x, 0)
unsorted = y.new(*y.size())
unsorted.scatter_(0, ind, y)
print((x - unsorted).abs().max())
14 Likes

Thank you for the sample code. It helps to unsort the indices. However, how could you do something similar for the hidden states to unsort them in the initial input order ? I have some trouble because it’s a 3D tensor and ind is only 1D.

Thank you !

EDIT: A way without scatter would be How to properly unsort unpacked sequences? but I’m not sure that the gradient is well propagated and is not very efficient. Can someone confirm this ?

any idea if its a feature already ?
thanks in advance !

Can you post snippet of your code?

1 Like
lengths = torch.tensor([len(indices) for indices in indices_list], dtype=torch.long, device=device)
lengths_sorted, sorted_idx = lengths.sort(descending=True)

indices_padded = pad_lists(indices, padding_idx, dtype=torch.long, device=device) # custom function
indices_sorted = indices_padded[sorted_idx]

embeddings_padded = self.embedding(indices_sorted)
embeddings_packed = pack_padded_sequence(embeddings_padded, lengths_sorted.tolist(), batch_first=True)

h, (h_n, _) = self.lstm(embeddings_packed)

h, _ = pad_packed_sequence(h, batch_first=True, padding_value=padding_idx)

# Reverses sorting. 
h = torch.zeros_like(h).scatter_(0, sorted_idx.unsqueeze(1).unsqueeze(1).expand(-1, h.shape[1], h.shape[2]), h)

This should help.

4 Likes

Thanks! I’ve been trying to figure out how to reverse the sorting and this is the best solution so far.

Isn’t that a good idea to just sort the labels(classes) after sorting the data instead of reversing the order.

Actually they are the same.

Do we still need to sort the batch by decreasing sequence length before pack_padded_sequence or has it been improved with recently ?

@Diego999 I think so. In pytorch 1.1 you don’t need to sort the sequence.