Hello,
I have 2 difference datasets consisting of,
Dataset 1: 19 million words
Dataset 2: 2 million words
The task is to train the model sequentially with the tow datasets, each dataset has different training and validation set, and I have only one test set.
When I train the model for 50 epochs with dataset one, I get state-of-the-art results. Then, I loaded dataset 2 during training with the same hyperparameters of dataset2. The problem is the performance is decreased by 0.80 %.
My question is, what the best way to train my model sequentially with two datasets?
Thanks