The problem is that with seed 1 the result are Ok and consistent if I repeat it many times
for seed=2 the results are consistent from one run to another but are absolutly NOT OK at all (it is not just a matter of percent difference)

have you experienced this kind of behavior

does the seed should respect some convention/range/values ? when I was starting research 30yrs ago with FORTRAN code the random seeds usuable were special, and if one did respect the rules than the results were completky crazy. Is it the case here?

No the random seeds can be anything in pytorch. It will just change where the RNG starts.

It is possible that some network training is not very stable and will work for one random seed but not the other.
Also you might want to check the notes on reproducibility in the doc here to see if runs with the same seed will give the same result.

Thanks @alabanD I have read the doc you mention before posting. It is correct that fixing all the random seed and determinist variables, give the same result. But my point is that switching seed 1 to 2, destroy completly the generalisation power. So, you point that some network training are unstable. Do you have any reference on this subject? thanks.

You might also try seeding the random-number generators with
3, 4, and 5 (and just in case there is something funny about how
the seeds work ā which I strongly doubt ā with 314158, 314159,
and 314160, as well).

Can you characterize the goodness of your results with a single
statistic? It would be interesting to see that statistic for a number
of different seeds.

Hi! Let me elaborate a liitle bit more my āgoodnessā probe:
As input of my network I have images in tensor dimension C,H,W with for instance C=5, H=W=64; the output is a vector of nbins length which reprensent the p.d.f of some variable āyā: p(y). I have train & test set with y_true which is a float vector in [0,1]. My favorite variables are statistical summary of the distribution dy = (y_predic-y_true)/(1+y_true) with y_predict is the weighted mean using the probability p(y) and the location of bins center value. So, my statistical summary is the sigma ~ 1 and the fraction of outliers (eta) defined as |y_predic - y_true| > 5.

Now comparing different seed run: example
seed=0 sigma=11, eta=1; <== OK
seed=1 ~ seed=0
seed=2 sigma=15, eta = 11 <===== Not OK

Now, I have used other seeds and some are ok, other no.
So I was wandering if there are āgoodā seeds as it generrate a good random number generation unbiased, and some other seeds which produce biased random series.

There are no āgoodā and ābadā seeds.
As explained by others before, your model or whole training procedure might not be very stable, so that some random numbers yield to good results, while others make the model collapse.

You can mostly avoid these effects in supervised learning by using appropriate parameter initialization methods.
If you are working on a new method and would like to write the results in a paper, I would recommend to use a few random seeds and report the number or successful runs as well as crashes.
On the other hand if you are explicitly using a āgoodā seed, I would consider this as a hack, which makes it hard to draw any useful conclusions from your method.

@ptrblck
Thanks a lot, but do not be affraid I am more concerned by systematics and I will not pretend to get better model than others. Now, all of you point that it is a sign of instability. Good, as I was tracking a bug Let me more elaborate the use case: Iām using an Inception like model and try to train it with Adam with eps=0.1. Iām using torch but in tensorflow doc for this minimizer it is explicitely stated that for Inception on should use eps=0.1 or 1 and not using the 10^-8 default value. By the way I have experimented with eps=0.01 and the results were completly instable.

May be you have nice advise: eg. an other minimizer (eg. SGD)ā¦? notice that I have used also AdamW and concerning the seed behavior I have not got more stable seed free results.

Thanks for the information.
Iāve found this in the TF docs:

For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1. Note that since AdamOptimizer uses the formulation just before Section 2.1 of the Kingma and Ba paper rather than the formulation in Algorithm 1, the āepsilonā referred to here is āepsilon hatā in the paper.

I didnāt have a chance to look into the formulas closely and Iām not sure, if the TF docs are applicable to the PyTorch optimizer.

On a general question: were you able to use the default Adam optimizer and get āstableā results?

Well to answer your question I realize that I certainly do not master every subtilities of ML
Concerning the model Inception-like I mean it is a network with 1 conv layer followed by 5 inception cells followed by 3 fully connected layers. Each Inception cells gather 1x1, 3x3 and 5x5 eventually kernel convolutions processed in paralell. This is in fact the mode used by some colleagues: https://arxiv.org/pdf/1806.06607.pdf.
So, I find some Adam settings that is ok cf. the train and test loss derease epoch after epoch and I a certain point they diverge ie, overfitting is arround, so I stop the learning phase. As I save intermediate state it is quite easy. Now, I do not know what is the auxiliairy loss you are refeering too.
An other infomation I got today: with SGD I get the same behaviour.