Different seed lead to very different test result on the test set

When I trained my model several times with different seed setting. And I use 70% data for training, 30% data for testing. After training, test on the test set.

But different seed leads to very different test results, like the following:

seed: 41, 93, 142, 194, 245
test: 95%, 90%, 96.67%, 95%, 93.33%

The biggest difference is 6.67 %.

Can anyone help me and explain why? I’m very confused Because I think different seed should get similar results when using the same training data and test data.

I used this function for seed setting and I used CPU only:

def setup_seed(seed):

torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

Thanks in advance.

A high variance in the final model performance points to an unstable training routine, which might come from different data splits, the model architecture, model initialization etc.

You could try to stabilize the training for different seeds by e.g. changing the parameter initialization or making sure the data split is stratified.
You should not pick the best result based on a specific seed. If you can’t stabilize the training, you could calculate the mean and stddev of the accuracy and report this instead.

Thank you very much! I will try your advice later.