Hi,

If I run scatter_add twice in the exact same setting, I end up with a non-negligible gap between both results. This is weird as they should be exactly the same results since they correspond to the exact same operations. Here is a code to reproduce this strange behavior:

```
import torch
splat_1 = torch.zeros(1, 8, 80).cuda()
splat_2 = torch.zeros(1, 8, 80).cuda()
for i in range(100):
indices = torch.randint(0, 80, (40000,)).unsqueeze(0).repeat(8, 1).cuda()
feat = torch.rand(8, 40000).cuda()
splat_1[0] = splat_1[0].scatter_add(1, indices, feat)
splat_2[0] = splat_2[0].scatter_add(1, indices, feat)
print(((splat_1-splat_2)**2).sum())
```

I obtained on average 0.04 squared difference.

Any explanation on this? I assume it is due to some computation error accumulation but since these are the exact same operations, should not the error get accumulated in the exact same way both for splat_1 and splat_2?

Thanks in advance for your help.

Samuel