C compiler dependency on training performance

Hello there,
I found that my model’s training performance depends on the C compiler (whether it is the Intel compiler or GCC).
For my model (It uses Pytorch and Pytorch_geometric), the training loss metric is quite large and saturated early when using the Intel compiler, otherwise the model is trained well (even in different two machines but the same compiler).
The difference is quite large (about 50%, usually not acceptable in the same model and hyperparameters).
I wonder if it is natural behavior or my fault.

My environments are as below.
Python 3.9.15
Pytorch 1.12.1
Pyg 2.2.0
CUDA 10.2 / 11.6
Intel compiler 19.1
GCC 4.8.5 / 10.2
(Two machines use different CUDA and GCC, otherwise same.)

I think its naturally I mean its depends on the compiler flags and how the compiler do optimization which can be very different in the end also include the OS .This also is true for Pytorch itself because it has also stuff which aren’t turned on necessarily(Mip, AVX certain libraries from 3th party) on top of the compiler stuff. You can improve a lot if you have the time for it.