I am working on NLP tagging models (i.e. I have to produce one tag per input token). In this context, I am working with:
- padded sequences
- BertModel / BiLSTM stack
Ideally, I would prevent any padding side-effect (aka no backpropagation on padded states). What is the correct way to handle this ? I found the PackedSequence object but it’s not clear how it behaves especially when several models are stacked.
Thanks for your help or any pointer on the best practice on this topic !