Different results based on cudatoolkit version

Hi everyone :slight_smile:

I just noticed that when I train a network in an anaconda environment with cudatoolkit=10.2.89, I get different results to when I train the exact same network in a different environment with cudatoolkit=10.1.243 (everything else is the same). Is this behaviour to be expected?

I seed everything and I can reproduce the results within each environment, they are just different from each other…

Any help is very much appreciated!

All the best
snowe

Hi,

I am afraid this is expected. Results (especially on CUDA) are reproducible only for a fix hardware/library version and if the deterministic flag is set to true on pytorch.

If you change anything there, the floating point arithmetic order can lead to different floating point results. These differences then get amplified by the model depth and gradient descend, leading to a final result that is completely different (even though, if your model is stable, the final loss value should be similar).

1 Like

Hi @albanD, thank you for your response!