Is it possible to re-train a finetuned NER model on a dataset with a different tagset (respect from the first training dataset)?

Hi all, first time here and very new to NLP.

I used PyTorch to finetune XLM-RoBERTa with a german dataset for NER that has 7 tags. Let’s call this model xlmr-finetuned.

Now I have another german NER dataset that comes from historical newspapers (so it contains many errors), and I would like to finetune or re-train the last model (xlmr-finetuned) on this dataset. The problem is that this historical dataset has 11 tags (the same 7 tags as the first + 4 new tags). So, when trying to load the model with .from_pretrained I get a size mismatch error about the sizes of the model (xlmr-finetuned) and the configuration of the new dataset I want to finetuneit on:

RuntimeError: Error(s) in loading state_dict for XLMRobertaForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([7, 768]) from checkpoint, the shape in current model is torch.Size([11, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size([7]) from checkpoint, the shape in current model is torch.Size([11]).

So, my question is, is it possible to finetune, with a different tagset, an already finetuned model? Or, on the contrary, transfer learning can only be done between very similar datasets?

I’m already considering solutions that involve modifying the datasets, like removing the 4 new tags in the new dataset, or telling the model in the first finetuning that there are 11 tags instead of 7 (even if there are not).

Thanks a lot in advance!