Why same model in CUDA and CPU got different result?

You are not the first one to have such problem. I have a similar one here and there is another unanswered one here.

I suggest you try to locate the source of divergence by yourself first, it makes it easier to help you. I had a Python code, not C++, but I’ll share it here so you get the idea on how to locate the problem.

  1. Save off the intermediate variables on CPU and GPU inference:
    torch.save(variable, "/path/to/varfile")
  2. then afterwards load both for analysis:
cpuvar = torch.load("/path/to/varfile_cpu", map_location="cpu")
gpuvar = torch.load("/path/to/varfile_gpu", map_location="cpu")
  1. compare:
close = torch.isclose(cpuvar, gpuvar, rtol=1e-04, atol=1e-04)
print("SIMILAR", close[close==True].shape)
print("FAR", close[close==False].shape)

Perfect case is where CPU and GPU will have similar results for the same input. Compare all variables until you will find the divergence.