Cudnn.benchmark slowing execution down

koelscha · December 10, 2018, 2:50pm

Hi,
I just played around a bit with a large architecture and found that my network trains faster when I turn off cudnn.benchmark. The network has a fixed input size. I wanted to see if it is because of my architecture and tested speed with some of the provided torchvision models. For all of them, the same holds. cudnn.benchmark slows testing and training down.

Code to reproduce:

import torch
import torchvision.models as models
import timeit

torch.manual_seed(0)

torch.backends.cudnn.benchmark = False

model = models.densenet121(pretrained=True)
model.eval()
model.cuda()

in_tensor = torch.empty([16, 3, 224, 224]).cuda()

def bench():
    _ = model(in_tensor)

print(timeit.timeit("bench()", setup="from __main__ import bench", number=100))

If I run it with cudnn.benchmark = True, I measure 4.1 seconds, and with cudnn.benchmark = False, the program finishes after 3.2 seconds. That’s quite a difference. When training, the difference is even bigger.

I am running pytorch installed from conda. pytorch version is 0.4.1 with cuda 9.0 and cudnn 7.1.2 on Ubuntu 18.04. The GPU is a GTX 1080 Ti.

Am I doing something wrong? If not, then why is cudnn.benchmark enabled by default?

Cheers,
Andreas

ngimel · December 10, 2018, 5:45pm

The first iteration should be excluded from timing, because when cudnn.benchmark is enabled, in the first iteration pytorch will try all available algorithm, and first iteration will be very slow. Also, for proper timing you should be calling cuda.device.synchronize() after you loop.

sytelus · March 20, 2020, 9:03am

I’m seeing this behavior as well (for all epochs) but only randomly about 20-30% of the runs. When cudnn.benchmark=True, each epoch runs about 8X slower for the bad run compared to good run. Both runs have identical code and hardware but get different random seed. For the bad run GPU is driven to 100% while for the good run it stays around 80%. This is happening for 1080Ti and I’m testing if this also happens on other SKUs. My guess is that for whatever reason cudnn decides to try all algos at each epoch and drives GPU red hot. After disabling cudnn.benchmark, we don’t see this issue.

ptrblck · March 20, 2020, 9:22am

cudnn.benchmark will try different algorithms for each input shape.
If your model expects a lot of varying input shapes, you might want to disable benchmark mode.