LSTM: Padding within sequence

My sequences can contain padding anywhere within the sequence as well as at the start and at the end.
I am aware how to use pad_sequenceand pack_padded_sequence to add padding to the sequences at the end, so that all sequences have the same length.
Does anyone know whether there exists something similar that I could use in my case?

Do you intend to intend to use a way to pad sequences by inserting 0s anywhere in the sequence randomly, and not just at the end?

Yes exactly. So I have multiple time series which are missing some time steps at different points. I padded all sequences to the same length by inserting zeroes where data is missing and I now want to “tell” the model to not pay attention to these time steps.

So, essentially it boils down to modifying the attention mask so that it has a 0 for all values that are 0 in the sequence, and 1 otherwise. Is that it? What model are you using?

If yes, can we then construct the attention mask like:

seq = torch.tensor([1.0, 2.0, 0, 3, 0, 8, 9, 0])
attn = torch.where(seq!=0, 1, 0)
print(attn)

gives:

tensor([1, 1, 0, 1, 0, 1, 1, 0])

Yes, exactly. Thank you!
I am using an LSTM. How can I now add the attention mask to the model?
Another thing is, I want to label the sequences (many-to-one), so I just need the last timestep’s hidden state. Once I know how to tell the model which entries to pay attention to, how can I make sure that it returns the hidden state of the last timestep that is not padded?