I am trying to fine-tune a GPT-2 model on next token prediction task.
I manually add BOS and EOS tokens for each sentence.
However, I don’t know if I have to mask these tokens or not during training ? I was thinking about masking them (at least EOS for sure). But I have a doubt for the BOS. Indeed, during inference, the BOS will be the first token used to predict next token based on its embedding.
So should I mask it or not ?
Thank you !