Lstm inference generate different results in gpu vs cpu

I have a lstm model, its inference in cpu different from its inference in gpu, wonder why

Depending on the differences you are seeing you might observe the expected errors caused by the limited floating point precision and a different order of operations.