Embedding weight renormalization is inconsistent on CUDA?

Hi, when I run the following code in a jupyter notebook cell multiple times, the result may be different each time. This happens only when it runs on CUDA.
I’ve recorded a short video of it: here

import torch
import numpy as np


x_np = np.random.randint(0, 5, (30, 2))
weight_np = np.random.randn(5, 8).astype(np.float32)

m = torch.nn.Embedding(5, 8, max_norm=1)
m.weight.data = torch.tensor(weight_np)
x = torch.tensor(x_np)

m = m.cuda()
x = x.cuda()

y = m(x)

print(m.weight[0, 0])

Do you think I did something wrong? or is this a bug? Thanks for any help!