Mask BOS token for GPT-2


I am trying to fine-tune a GPT-2 model on next token prediction task.

I manually add BOS and EOS tokens for each sentence.

However, I don’t know if I have to mask these tokens or not during training ? I was thinking about masking them (at least EOS for sure). But I have a doubt for the BOS. Indeed, during inference, the BOS will be the first token used to predict next token based on its embedding.

So should I mask it or not ?

Thank you !