Single core of TPU gives inference results different than the CPU results

I encountered an issue when using PyTorch XLA to train a model on TPU. My main code gives a different results than training with CPU or GPU so I decided to check using a toy example and found that prediction using pytorch XLA gives results different than prediction using CPU.
I also tried to check using pytorch lightning but it gives the same result like CPU so how to setup pytorch xla to give identical results like lightning?
Notebook

Hello,

I encountered a discrepancy when training a model using PyTorch XLA on a TPU, where the results differed significantly from those obtained using CPU or GPU. After further investigation using a toy example, I noticed that the predictions made with PyTorch XLA were not consistent with those made on the CPU. Interestingly, when using PyTorch Lightning for training on TPU, the results were identical to the CPU output. This led me to suspect that there may be some device-specific differences or initialization issues when using PyTorch XLA directly.

Best Regard,
Frank
ReadTheory

1 Like