When running a select and assign statement on GPU with the same data twice I do not get the same output. However, this works as expected on CPU. What am I missing here? Is selecting on GPU non-deterministic somehow?

```
import torch
n = 100
y = torch.randint(0, n, (n,))
store = []
x = torch.rand(n, 3)
for _ in range(2):
z = torch.zeros_like(x)
z[y] += x
store.append(z)
print((store[0]==store[1]).all())
store = []
x = torch.rand(n, 3).cuda()
for _ in range(2):
z = torch.zeros_like(x)
z[y] += x
store.append(z)
print((store[0]==store[1]).all())
# console output ->
# tensor(True)
# tensor(False, device='cuda:0')
```