What is the differenc between cudnn.deterministic and .cudnn.benchmark?

tom · February 24, 2019, 11:21am

Even though you asked about differences, first the obvious similarity: Both affect CuDNN and as such only operations when they dispatch to CuDNN-implementations. There is ongoing work on broadening information on and the ability to get the same result when running the same thing twice and the reprodicbility note covers - modulo bugs - the sources of uncertainty in other (cuda) operations.
As background for CuDNN, it is important to realize that, for many operations, CuDNN has several implementations, let’s call them different algorithms.
Now cudnn.deterministic will only allow those CuDNN algorithms that are (believed to be) deterministic. Crucially for what follows, there still might be several left, though. This means that you would expect to get the exact same result if you run the same CuDNN-ops with the same inputs on the same system (same box with same CPU, GPU and PyTorch, CUDA, CuDNN versions unchanged), if CuDNN picks the same algorithms from the set they have available.
Now, usually CuDNN has heuristics as to which algorithm to pick, that, roughly, depend on the input shape, strides (aka memory layout) and dtype. Those heuristics cover a broad set of cases, but, as they are heuristics, they might pick a less efficient algorithm at times. In order to improve on using heuristics, if you set the cudnn.benchmark the CuDNN library will benchmark several algorithms and pick that which it found to be fastest. There are some rules as to when and how this is done (you’d have to check their documentation for details, rule of thumb: useful if you have fixed input sizes). This may mean that the benchmarking may pick a different algorithm (due to other things running on the host box etc.) even with the deterministic flag is set. As such it seems good practice to turn off cudnn.benchmark when turning on cudnn.deterministic.

Best regards

Thomas