Reproducibility in numerical operations


I have noticed that, even though I am actively fixing the seed, there are small differences between runs when the I multiply two tensors. More specifically, I have located a run in which I perform an element wise multiplication of two tensors and, in one run, the multiplication of torch.sigmoid(torch.tensor(0.0000))*torch.sigmoid(torch.tensor(0.1744)) returns 0.2717 and other in which returns 0.2718. I am highly confused and I don’t really know how can I fix it. It’s important since this small differences accumulated and provoke higher differences in the long term. I set the same seed in both experiments and both run in the same GPU model.

I have investigated more the issue and apparently the difference is not in the multiplication but in the output of a loss function, the difference is quite small and cannot be captured with the precision of the print I wrote, but in some cases I get 0.0053490624 vs 0.0053490633. Could this be fixed simply changing the data type?

Thanks! :slight_smile:

You are most likely seeing the expected small errors caused by the limited floating point precision and a potentially different order of operations in the used operators.
Take a look at the Reproducibility docs to see how deterministic algorithms could be enabled (which could be slower, but should yield the same results).

Increasing the dtype width (e.g. by casting the tensors to float64) will increase the precision for a performance penalty (on GPUs this could yield a significant slowdown).