Why does index_add_ and scatter_add_ induce non-deterministic behavior on the CUDA backend?

Basic question here but was curious why the order of addition at a particular location in a target tensor may vary the value at the location across different calls on index_add_ or scatter_add_

scatter_add_ and index_add_ both use atomic operations (atomicAdd as seen e.g. here) as described in the Reproducibility docs, which introduces these small variations.