Stop backpropagation of sequence models for padded sequences


I am working on NLP tagging models (i.e. I have to produce one tag per input token). In this context, I am working with:

  • padded sequences
  • BertModel / BiLSTM stack

Ideally, I would prevent any padding side-effect (aka no backpropagation on padded states). What is the correct way to handle this ? I found the PackedSequence object but it’s not clear how it behaves especially when several models are stacked.

Thanks for your help or any pointer on the best practice on this topic !

Use masking over the padded sequence.
You can either use either create your own mask or used masked_fill.
Checkout this blogpost.