Pack_padded_sequence with inconsistent (repeated) padding

m_h · January 21, 2022, 2:33pm

Following the example from the docs I am trying to solve a problem where the padding is inconsistent rather than at the end of the tensor for each batch (in other words, no pun intended, I have a left-censored and right-censored problem across my batches):

 # Data structure example from docs
seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
 # Data structure of my problem
inconsistent_seq = torch.tensor([[1,2,0], [0,3,0], [0,5,6]])

lens = ...?
packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)

How can I solve the problem of masking these padded 0’s when running them through an LSTM using (preferably) PyTorch functionality?

AbdulsalamBande · January 22, 2022, 5:37pm

If I have understood your problem right, you mean that some data points are padded at the left while others are at the right. Usually the padded 0’s are to make the length of all the data points the same (usually added at the end like above). So if you have any inconsistent paddings(left or right), you might need a separate character instead of 0.

m_h · January 24, 2022, 7:03am

That is correctly understood. The position in the time series holds information whereas I can not just pad with 0’s in the end. It would not make much sense to use any other character than 0 as I am not looking to impute but rather mask all the instances that I don’t want to throw into the network and impact the weights.