Data splitting into train/test sets or random choosing?

What will be more accurate and more right?

I took a notebook from pytorch docs to classify surnames by language (RNN).

I modified it a bit to test trained model on a validation set and got about 55% percent of accuracy.

But I’d like to train the model on a train dataset and test on a validation one and not to choose data points randomly.

So I modified it some more.
And I’ve got only about 30 percent of accuracy and the next confusion matrix:

Maybe somebody can explain that.
Is that related to the fact of splitting data into two data sets or is it something else?

That’s the confusion matrix from the original notebook:

Since the evaluation was done using random training samples, the confusion matrix will give you biased results. In your case, it seems that you are overfitting the training data as explained in your other topic.