Cpu vs gpu (cuda) segmentation results difference

kabbas570 · July 22, 2023, 3:23pm

Hello PyTorch Community,
I am inferring a pre-trained model trained on some segmentation task.
If I evaluate cuda (GPU) (on my GPUs cluster), I get different results than if I do it on my local machine on Windows CPU.
Here are some visual results; I found two reasons behind this, but the visual results should vary that much; not sure.

Floating point precisions
Difference in the execution order of the operations

Any clue will be appreciated.
Thank You
Cheers
Abbas

eqy · July 24, 2023, 12:36am

This is potentially expected, especially if the model was trained on CPU and then moved to GPU. If the GPUs are Ampere (sm80) or newer, you might want to check if setting NVIDIA_TF32_OVERRIDE=0 changes the results somewhat (this env var turns off the use of the TF32 dtype internally).

If there is a serious degradation of accuracy and the above env var doesn’t meaningfully improve your results, I would check the layers used by your model one-by-one to see where the differences are coming from.

kabbas570 · July 24, 2023, 10:09am

Hello, thanks for your reply.
The model is trained on GPU and then tested on both CPU and GPU.
The GPU is A100.
Thanks for the suggestions, will try it.
Cheers
Ababs