Bigger dataset not helping in accuracy for BERT model

I am pre-training a BERT model on a custom dataset. The accuracy of the model gets stuck at ~ 20%. Since it is recommended that one should try to overfit a model on a small dataset before moving to a bigger dataset, I tried it and my model easily overfits on a small dataset (100 samples) you can see the plots below





But when I increase the size of the dataset (250K samples) the accuracy doesn’t improve after a certain point, as you can see below




I also tried with bigger-size models but still, the same problem persists. Please any suggestions in this regard would be very helpful. Also, let me know if more details are required.