I’m trying to predict the sentiment towards a target in a sentence, thus my model has two input: the whole sentence and the target, both of them may have various length. I’m going to feed them into two different Bi-LSTM and do some other work.
In order to train my model in batch and avoid computing the masked timesteps, I have to use
torch.nn.utils.rnn.pad_sequence to pad my sentences and targets and then use
torch.nn.utils.rnn.pack_padded_sequence to pack them. However, the method
pad_sequence require the list of sequences to be sorted in the order of decreasing length. It may be easy to satisfy when the model only has one input, but it is hard to be done for me because a longer sentence may have a shorter target.
Let me give an example. Below is two sample from the dataset, where an ‘x’ stands for a word…
- sentence: xxxxxxxxxx, target: xx
- sentence: xxxxxxx, target: xxx
In this case I can’t sort them in a proper way. From the perspective of padding the sentence, I should put sample 1 above sample 2, while I can’t pad the target because the target in sample 1 is shorter.
So how can I solve the problem? Is there any workarounds(to avoid computing the masked timesteps)?
Later I solved the problem myself.
The key is that I don’t have to sort the examples like that.
I can record the original index of sentence and target, then sort then individually, pack them and send them into RNN and unpack the output, lastly reorder them. In this way I can calculate the loss correctly.