CPU and GPU inference differences

ck92 · March 9, 2025, 7:54pm

I’m running the following script that creates a pytorch model, runs inference on CPU and GPU. I get differences as high as 2e-4. Is this expected with float32?

import torch
from monai.networks.nets import VarAutoEncoder

model.eval()
input = torch.ones((1,6,64,64))
recon, mu, logvar, _ = model(input)

device = torch.device('cuda:0')
model = model.to(device)
input = input.to(device)
recon_gpu, mu_gpu, logvar_gpu, _ = model(input)


diff = mu_gpu.detach().cpu().numpy() - mu.detach().numpy()
print(f'diff: {diff}')

diff: [[-7.45356083e-05 -3.43009830e-04  1.75867230e-04 -1.16109848e-04
  -4.17232513e-05  2.66432762e-05 -1.32430345e-04 -1.16214156e-04
  -5.60581684e-05  2.44081020e-04 -9.73343849e-05  2.08556652e-04
   1.40473247e-04 -1.85698271e-04 -2.05576420e-04  4.55528498e-05
   1.46627426e-04  2.26497650e-06 -1.25408173e-04 -1.09910965e-04]]

KFrank · March 10, 2025, 1:47am

Hi Chait!

ck92:

diff: [[-7.45356083e-05 -3.43009830e-04  1.75867230e-04 -1.16109848e-04
  -4.17232513e-05  2.66432762e-05 -1.32430345e-04 -1.16214156e-04
  -5.60581684e-05  2.44081020e-04 -9.73343849e-05  2.08556652e-04
   1.40473247e-04 -1.85698271e-04 -2.05576420e-04  4.55528498e-05
   1.46627426e-04  2.26497650e-06 -1.25408173e-04 -1.09910965e-04]]

Differences at this level don’t surprise me. (As an aside, without knowing the typical
scale of mu, we don’t actually know how large diff is relative to the size of mu, so
I am assuming that mu is roughly of order one.)

Your cpu and gpu computations could easily be performing various arithmetic operations
in mathematically equivalent, but numerically different orders, thereby leading to differing
round-off error. A single float32 operation would typically have a round-off error on the
order of 1e-7. But such round-off errors accumulate with further computation. It wouldn’t
surprise me for the output of a reasonably deep model (lots of layers) to end up with cpu
and gpu accumulated round-off errors that differ on the order of 1e-4.

You might try converting your model to double() (torch.float64) and see if diff
shrinks by several (about nine) orders of magnitude.

Best.

K. Frank