Should PyTorch trained model produce the *exact* same results on Windows 10 / Ubuntu?

Hi all,

I’m using PyTorch 0.4 on my Windows 10 laptop - but deploying on Ubuntu Linux 16.04 (also PyTorch 0.4). Both OSs use CUDA 9.0 (V9.0.176). Yet I’m getting very (very) slightly different results. That actually produces some different classifications in my multi-class classification problem but I’m mainly bothered by the inference not being bit-exact.
Is this behavior something I should expect? can it be fixed?

Thanks,
Ran

Hi,

I guess this is expected. Default cudnn algorithms are not deterministic.
pytorch cpu random should be the same across platforms, but I’m not sure cuda random will be. Similar for python/numpy random.
I’m afraid it’s going to be very time consuming to make it bit-exact, and you migh end up with a non negligeable slow down for not being able to use cudnn kernels and other fast algorithms that are not bitperferct.