The same deep network, the same input and code, the results after the decimal point are inconsistent on different hardware graphics cards. first i thought maybe the random seed lead to this. however, i still get the inconsistent result with identical random seed. The difference is only after the decimal point on (2070 Vs 2080Ti or 3060), is this normal?
what should i do to make the results same?
It could be expected depending on the relative error and used operation.
E.g. a change in the order of operation creates mismatches due to the limited floating point precision:
x = torch.randn(100, 100)
s1 = x.sum()
s2 = x.sum(0).sum(0)
print((s1 - s2).abs())
# tensor(3.0518e-05)
but i can get the same result use your code at different GPUS with identical random seed. This only means that after the random number is determined, the random results will be consistent. i dont think this is answer to my question. i still cannot figure out why my results are diffrent.
No it doesn’t and would still depend on the used algorithm. I.e. if the sum
kernel outputs deterministic results, your comparison is valid. If not, different results are expected.