Why does index_add_ and scatter_add_ induce non-deterministic behavior on the CUDA backend?

scatter_add_ and index_add_ both use atomic operations (atomicAdd as seen e.g. here) as described in the Reproducibility docs, which introduces these small variations.