Handling GPU/CPU compute differences

Based on the error of ~1e-5 you are most likely running into small errors caused by the limited floating point precision.

  1. It’s not a magic fix, but will give you more precision, thus reducing the error if you are using a wider dtype.

  2. On GPUs you would expect to see poor performance using float64.

  3. It’s not necessarily only visible between CPU and GPU calculations, but depends on the order of operations which could also change on the same device as seen e.g. here:

x = torch.randn(100, 100)
s1 = x.sum()
s2 = x.sum(0).sum(0)
print((s1 - s2).abs())
# tensor(1.9073e-05)

That’s not the case as both as using the IEEE floating point standard (unless you are using TF32 on Ampere GPUs). Take a look at this Wikipedia article for more information.