Hi, when I run the following code in a jupyter notebook cell multiple times, the result may be different each time. This happens only when it runs on CUDA.

I’ve recorded a short video of it: here

```
import torch
import numpy as np
np.random.seed(0)
x_np = np.random.randint(0, 5, (30, 2))
weight_np = np.random.randn(5, 8).astype(np.float32)
m = torch.nn.Embedding(5, 8, max_norm=1)
m.weight.data = torch.tensor(weight_np)
x = torch.tensor(x_np)
m = m.cuda()
x = x.cuda()
y = m(x)
print(m.weight[0, 0])
```

Do you think I did something wrong? or is this a bug? Thanks for any help!