Unexpected results with pre-training model - discussion

Aiman_Mutasem-bellh · March 10, 2021, 12:24pm

Hi all

My project is a Grammar correction based on multi-head attention model. I have two datasets the first one is synthetic data in 3 GB and the second on the original data in 200 MB. The training strategy as:

Load the pre-training data and start training to save the best version.

save_model1042×140 6.56 KB
Change the learning rate and the number of epochs, load the pre-training model and continue training the model with the original dataset.

I have a confusing problem related to the results when training the model based on pre-training model, which I get unexpected results (below score) compare with the same model that trained from scratch without pre-training. I have no explanation for this problem taking into account the different sizes between both datasets.

Any suggestion to overcome this issue?

Kind regards,
Aiman Solyman