TL;DR: After using torch.cuda.amp, I have deterministic training, even though I set torch.backends.cudnn options as deterministic=False, benchmark=False, etc.
Environment
- 2080Ti (CUDA 11.2, Driver 460.91.03)
- PyTorch 1.11.0.dev20211127
- Python 3.9.7
I experimented with this minimal MNIST example and reproduced the nondeterminism across training runs (i.e. different epoch losses when I train from scratch multiple times). The source of nondeterminism is GPU operations since the random seeds are fixed.
If I set torch.backends.cudnn.deterministic=True, I see deterministic training like the original author.
However, if I use amp for mixed-precision training, I also see determinism even without deterministic=True.
Has anyone seen similar cases or have insights into how using amp could remove nondeterminism?
Here’s a writeup with further details on experiments and code.
Thanks!