TL;DR: After using torch.cuda.amp
, I have deterministic training, even though I set torch.backends.cudnn
options as deterministic=False
, benchmark=False
, etc.
Environment
- 2080Ti (CUDA 11.2, Driver 460.91.03)
- PyTorch 1.11.0.dev20211127
- Python 3.9.7
I experimented with this minimal MNIST example and reproduced the nondeterminism across training runs (i.e. different epoch losses when I train from scratch multiple times). The source of nondeterminism is GPU operations since the random seeds are fixed.
If I set torch.backends.cudnn.deterministic=True
, I see deterministic training like the original author.
However, if I use amp
for mixed-precision training, I also see determinism even without deterministic=True
.
Has anyone seen similar cases or have insights into how using amp
could remove nondeterminism?
Here’s a writeup with further details on experiments and code.
Thanks!