This comes from the limited precision of floating point numbers.
Each operation on float32 has a precision of ~1e-6. And accumulating a large number of them can lead to big differences.
Similar for float64 where it starts around 1e-12 and goes up from there.

Hello AlbanD,
Thank you for your quick response and time.
Maybe I do not understand fully. Shouldn’t pytorch and numpy operations yield same values when initialized with same floating point precision assuming the rounding rules are equal?

Unfortunately no
The reason is that floating point operations are not associative: (a + b) + c != a + (b + c). And so any difference in the order where things are accumulated will lead to such discrepancies.
For these ops, both pytorch and numpy use multithreading. But because they do this in a slightly different way, you see these differences.

Note that some operations are not even deterministic (usually on the GPU) and running it twice in a row won’t give you bit-perfect same results. See the note in the doc about determinism if you want to learn more.