I am currently using the following random seeds:
I would have expected, that this should always produce the same results given identical hyperparameters. Unfortunately it is not, am I missing something?
Do you mean that they aren’t identical to each other, or re-seeding doesn’t allow you to reproduce a set of values using the same seeding mechanism? It might help to provide a more concrete/complete example.
Random tensors created on cuda rely on the philox counter based RNG. CPU RNG uses Mersenne.
They use different RNGs because they are fundamentally different APIs. On a CPU we can sequentially choose subsequent values, but on CUDA we use “counter” based RNGs that can populate all values in parallel.
Thanks for your reply. Sorry for not being concrete - what I mean is, that I expect to always attain the same outputs/loss after each epoch, when restarting the training with the same hyperparameters. I.e. I expect the network to reproduce previous results.