According to the doc,
There are some PyTorch functions that use CUDA functions that can be a source of non-determinism. One class of such CUDA functions are atomic operations, in particular
atomicAdd
, where the order of parallel additions to the same value is undetermined and, for floating-point variables, a source of variance in the result.
AND a number of operations have backwards that use atomicAdd
, such as many forms of pooling, padding, and sampling. There currently is no simple way of avoiding non-deterministic in these functions.