What is the differenc between cudnn.deterministic and .cudnn.benchmark?

isalirezag · February 23, 2019, 10:07pm

what are the differences between cudnn.deterministic and cudnn.benchmark?
how should i know which one to use?
do they have effect on performance results or they just have effect on the time ?

MariosOreo · February 24, 2019, 1:23am

Hi,

This link is for torch.backends.cudnn.benchmark.
As for torch.backends.cudnn.deterministic, in my opinion, it can make your experiment reproducible, similar to set random seed to all options where there needs a random seed.

tom · February 24, 2019, 11:21am

Even though you asked about differences, first the obvious similarity: Both affect CuDNN and as such only operations when they dispatch to CuDNN-implementations. There is ongoing work on broadening information on and the ability to get the same result when running the same thing twice and the reprodicbility note covers - modulo bugs - the sources of uncertainty in other (cuda) operations.
As background for CuDNN, it is important to realize that, for many operations, CuDNN has several implementations, let’s call them different algorithms.
Now cudnn.deterministic will only allow those CuDNN algorithms that are (believed to be) deterministic. Crucially for what follows, there still might be several left, though. This means that you would expect to get the exact same result if you run the same CuDNN-ops with the same inputs on the same system (same box with same CPU, GPU and PyTorch, CUDA, CuDNN versions unchanged), if CuDNN picks the same algorithms from the set they have available.
Now, usually CuDNN has heuristics as to which algorithm to pick, that, roughly, depend on the input shape, strides (aka memory layout) and dtype. Those heuristics cover a broad set of cases, but, as they are heuristics, they might pick a less efficient algorithm at times. In order to improve on using heuristics, if you set the cudnn.benchmark the CuDNN library will benchmark several algorithms and pick that which it found to be fastest. There are some rules as to when and how this is done (you’d have to check their documentation for details, rule of thumb: useful if you have fixed input sizes). This may mean that the benchmarking may pick a different algorithm (due to other things running on the host box etc.) even with the deterministic flag is set. As such it seems good practice to turn off cudnn.benchmark when turning on cudnn.deterministic.

Best regards

Thomas