PyTorch: What is the best way to training model with using two datasets?

Hello, :slight_smile:

I have 2 difference datasets consisting of,

Dataset 1: 19 million words
Dataset 2: 2 million words

The task is to train the model sequentially with the tow datasets, each dataset has different training and validation set, and I have only one test set.

When I train the model for 50 epochs with dataset one, I get state-of-the-art results. Then, I loaded dataset 2 during training with the same hyperparameters of dataset2. The problem is the performance is decreased by 0.80 %.

My question is, what the best way to train my model sequentially with two datasets?