Padding problem about multiple input with various length

huhk-sysu · June 20, 2018, 8:01pm

Hi everyone.

I’m trying to predict the sentiment towards a target in a sentence, thus my model has two input: the whole sentence and the target, both of them may have various length. I’m going to feed them into two different Bi-LSTM and do some other work.

In order to train my model in batch and avoid computing the masked timesteps, I have to use torch.nn.utils.rnn.pad_sequence to pad my sentences and targets and then use torch.nn.utils.rnn.pack_padded_sequence to pack them. However, the method pad_sequence require the list of sequences to be sorted in the order of decreasing length. It may be easy to satisfy when the model only has one input, but it is hard to be done for me because a longer sentence may have a shorter target.

Let me give an example. Below is two sample from the dataset, where an ‘x’ stands for a word…

sentence: xxxxxxxxxx, target: xx
sentence: xxxxxxx, target: xxx

In this case I can’t sort them in a proper way. From the perspective of padding the sentence, I should put sample 1 above sample 2, while I can’t pad the target because the target in sample 1 is shorter.

So how can I solve the problem? Is there any workarounds(to avoid computing the masked timesteps)?

Later I solved the problem myself.

The key is that I don’t have to sort the examples like that.

I can record the original index of sentence and target, then sort then individually, pack them and send them into RNN and unpack the output, lastly reorder them. In this way I can calculate the loss correctly.