I am working on small texts doing Sequence Labelling. Specifically, I use a BERT model from the huggingface library (BertModel in particular), and I tokenize every text with the library’s tokenizer to feed the model. Since the texts are small, I have specified that the sequence length that the tokenizer produces is 256. My labels are binary (1 and 0) and every sequence element (BERT input token) is assigned a label.
For the loss computation I use Binary Cross Entropy (BCEWithLogitsLos) but the function considers also the padding tokens to compute the loss which also affects back propagation.
I want BCEWithLogitsLos to compute the loss only on the tokens of the text and not also on the padding tokens. Which is the best way to achieve that?