BiLSTM incorrect on MPS?

Hi, I been testing various NLP models on MPS, and I observe various strange things. It seems that bidirectional LSTM on MPS is not working correctly.

See the code below:

import torch
import torch.nn as nn

print("Torch version: ", torch.__version__)
torch.manual_seed(123)
embeddings = torch.rand(3,1,2)  # L, B, E

model = nn.LSTM(input_size=2, hidden_size=2, bidirectional=True)
print(model(embeddings)[0])

model.to("mps")
print(model(embeddings.to("mps"))[0])

Results are below. CPU above, MPS below. How can results on CPU be different than on MPS?
==> I am missing something?
I’m testing on MacBook Pro with M1PRO, 32GB, 10-core CPU, 16-core GPU.

Torch version:  2.0.0.dev20230205
tensor([[[ 0.0054,  0.2338, -0.1167,  0.2263]],
        [[ 0.0200,  0.3305, -0.0967,  0.2075]],
        [[ 0.0422,  0.3740, -0.0590,  0.1553]]], grad_fn=<CatBackward0>)
tensor([[[0.2325, 0.2447, 0.4149, 0.4394]],
        [[0.3524, 0.3702, 0.3516, 0.3785]],
        [[0.4139, 0.4346, 0.2306, 0.2531]]], device='mps:0',
       grad_fn=<CatBackward0>)

Thanks for reporting the issue! I believe this could also be a valid bug in the MPS backend so could you also create a GitHub issue for it as described here?