flag and gave it a try using a simple ResNet18 with MNIST. It turns out setting the flag to True actually results in a 2x speedup (~3 min instead of ~7 min). I ran the exact same code twice with and without

torch.backends.cudnn.deterministic = True

and was wondering how this is possible! I expected exactly the opposite. Can it be due to the implementation of the deterministic convolution operation, i.e., that it works just faster on smaller networks? The GPU usage is in both cases ~820 Mb.

Unfortunately, this is expected for some cases where cudnn heuristics is wildly wrong. benchmark = False deterministic=True uses default algorithm (implicit precomp gemm, iirc), benchmark = False deterministic = False (vanilla) calls cudnn heuristics to pick an algorithm, which may or may not be better than always using default.