Reproduciblity unable to be guaranteed after torch.compile

When I add torch.compile to the training pipeline, the results are getting different in evaluation even I fix the random seed in numpy, random, torch, and torch.cuda.

After I removing torch.compile, the results are deterministic across all.

Are you seeing non-deterministic behavior after following the steps mentioned in the reproducibility docs in torch.compile or are you only seeding the code?

Yes, I have followed the documentation, not only set the random seed but also set the cudnn.deteministic and torch.Generator.
Since my code had a lot of uncertainty before, I have followed the documentation to make it can be reproduced a long time ago, but this reproducibility failed after adding torch.compile recently.

Would it be possible to share the model so that we could reproduce it?

I’m sorry I can’t share our model for some reason.
But I have observed some phenomena:
First of all, the result difference of executing torch.compile on gpu is less than that of executing torch.compile on cpu.
Secondly, I use AMP in my code, does torch.compile requires to run under AMP mode?

No, torch.compile does not require amp.
Just to understand the issue a bit better: are you seeing differences in the outputs for repeated forward passes after calling torch.compile(model) or are you comparing the plain eager mode model vs. the compiled one?

Are you seeing a results may differ from eager type of warning in your code? If so you might want to enable this flag pytorch/config.py at master · pytorch/pytorch · GitHub

from torch._inductor.config import iconfig
iconfig.fallback_random = True

There’s been a new PR added to help with determinism as well here [inductor] fix scatter fallback and fallback in deterministic mode by yuguo68 · Pull Request #98339 · pytorch/pytorch · GitHub

Finally if none of those can solve your problem please try out the accuracy minifier which might indicate which line of your code in your model is diverging post compilation PyTorch 2.0 Troubleshooting — PyTorch master documentation

Yes, the difference you mentioned exists. There is a difference between the output of eager mode and compile mode, and the maximum difference is about 0.0003 under the same input.

But the difference I’m talking about is the results of different runs of the same code after torch.compile.
After the first epoch, the loss value for the first run is 0.001052 and the second is 0.001051.
Since my test data are sensitive to different models, this results in two test losses of 0.000195 and 0.000173 respectively.

I have the same issue with PyTorch 2.0.1