Deterministic cuDNN flag results in 2x speedup, how is this possible?

I just found out about the

torch.backends.cudnn.deterministic = True

flag and gave it a try using a simple ResNet18 with MNIST. It turns out setting the flag to True actually results in a 2x speedup (~3 min instead of ~7 min). I ran the exact same code twice with and without

torch.backends.cudnn.deterministic = True

and was wondering how this is possible! I expected exactly the opposite. Can it be due to the implementation of the deterministic convolution operation, i.e., that it works just faster on smaller networks? The GPU usage is in both cases ~820 Mb.

2 Likes

I tried to reproduce it with MNIST and ResNet18 as well as ResNet101 and see similar results:

ResNet18 time for just one epoch:

  • vanilla: ~38s
  • torch.backends.cudnn.deterministic = True: ~17s
  • torch.backends.cudnn.benchmark = True: ~13s

ResNet101 time for one epoch:

  • vanilla: ~77s
  • torch.backends.cudnn.deterministic = True: ~61s
  • torch.backends.cudnn.benchmark = True: ~52s

I would also expect the deterministic kernels to run slower.
Specs:

  • GTX1080 TI
  • Driver: 410.78
  • CUDA10.0

CC @ngimel is this expected or are we missing something?

4 Likes

Unfortunately, this is expected for some cases where cudnn heuristics is wildly wrong. benchmark = False deterministic=True uses default algorithm (implicit precomp gemm, iirc), benchmark = False deterministic = False (vanilla) calls cudnn heuristics to pick an algorithm, which may or may not be better than always using default.

6 Likes

Thanks for the info, that’s very interesting and good to know!