I have created 2 CNN models. The first model is my baseline model and the second model is an improvement on top of the first model.

I have observed one weird scenario. These two models always produce different results (difference between +0.05% and -0.05%). Sometimes model one produces a better result and some other times model two produces a better result.

How could I avoid that from happening? I know I can use “seed”, but it doesn’t mean the second model outperformed the performance of the first model in general.

I have done some research. It seems I need to generate a common shared initial weight for both of the models. Then I can compare apple to apple.

For example, if both of the models are all ResNet50 based. The logic should be:

1. Create an empty ResNet50 model and save the state dictionary X;
2. Load the state dictionary X to ResNet50-A;
3. Load the state dictionary X to ResNet50-B;
4. Compare the results between ResNet50-A and ResNet50-B.

I don’t think using a single seed (or the same parameters) would give you a good signal, since this seed (or parameter set) could stil be beneficial for one of the models.
To properly check the model accuracy, you could rerun each training script using different seeds and calculate the mean +/- stddev of the achieved accuracy.

I am thinking to take 5 or more (maybe 30 seeds to reach statistical significance?) different seeds to compare these 2 models.

For example, if each of my models has 1 million parameters. Should I set seeds like seed(1), seed(1million+1), seed(2million+1), seed(3million+1), seed(4million+1), and seed(5million+1) to avoid overlapped parameters?

The seed values themselves can be sequential for each run. This article doesn’t target your use case exactly, but also claims that sequential seeds should work.

Yes. I understand seed values are sequential. A manual seed function basically tells you a fixed point to start generating the sequential, random values. The sequence lasts infinitely. So I try to get the seed values without overlapping. I may try even larger distances between different manual seeds. It’s a good conversation with you though. Thanks.