I am using a multilingual BERT model from hugging face, and doing some additional pretraining using MLM. However, it is a bit different. I want to get mean value of sequence of masks and calculate the loss.
For example,
model = AutoModel.from_pretrained('bert-base-multilingual-cased')
input = [MASK][MASK][MASK] is a great guy.
label = [PERSON] is a great guy.
I want to train the model using the loss between mean value of three [MASK] tokens and a [PERSON] token.
What is the best way to do this?