I just played around a bit with a large architecture and found that my network trains faster when I turn off cudnn.benchmark. The network has a fixed input size. I wanted to see if it is because of my architecture and tested speed with some of the provided torchvision models. For all of them, the same holds. cudnn.benchmark slows testing and training down.
Code to reproduce:
import torch import torchvision.models as models import timeit torch.manual_seed(0) torch.backends.cudnn.benchmark = False model = models.densenet121(pretrained=True) model.eval() model.cuda() in_tensor = torch.empty([16, 3, 224, 224]).cuda() def bench(): _ = model(in_tensor) print(timeit.timeit("bench()", setup="from __main__ import bench", number=100))
If I run it with cudnn.benchmark = True, I measure 4.1 seconds, and with cudnn.benchmark = False, the program finishes after 3.2 seconds. That’s quite a difference. When training, the difference is even bigger.
I am running pytorch installed from conda. pytorch version is 0.4.1 with cuda 9.0 and cudnn 7.1.2 on Ubuntu 18.04. The GPU is a GTX 1080 Ti.
Am I doing something wrong? If not, then why is cudnn.benchmark enabled by default?