Mps large tensor handling bug

I have seem to stumble into an MPS bug.
I ran this experiment on Macbook Pro M3 max, and found some very wired behavior.
a = torchl.rand(55987200) > 0.5
a[a].all() → True
a = torchl.rand(55987200, device=“mps”) > 0.5
a[a].all() → False

it seems that a[a] will have the correct shape on both devices, so it seemse the comparison part of the code is correct, but when copying to the large buffer on mps, there seems to be a bug, because a.nonzero() will have many repetitions.

did anyone else notice this bug ?