What does torch.backends.cudnn.benchmark do?


(John Zhang) #1

whats the difference when setting it True or False?


[SOLVED] Titan V on PyTorch 0.3.0, CUDA 9.0, CUDNN 7.0 is much slower than 1080 Ti
Any reason why the following PyTorch (3s/epoch) code is so much slower than MXNet's version (~0.6s/epoch)?
(Alban D) #2

This flag allows you to enable the inbuilt cudnn auto-tuner to find the best algorithm to use for your hardware.


(Francisco Massa) #3

It enables benchmark mode in cudnn.
benchmark mode is good whenever your input sizes for your network do not vary. This way, cudnn will look for the optimal set of algorithms for that particular configuration (which takes some time). This usually leads to faster runtime.
But if your input sizes changes at each iteration, then cudnn will benchmark every time a new size appears, possibly leading to worse runtime performances.


#4

I tried using it with torchvision’s resnet101 but it gives worse performance. Is it normal? @fmassa @albanD


(Francisco Massa) #5

It depends on the task. If your input size is changing a lot, then it might hurt runtime, if not, it should be much faster.


(Mjchen611) #6

whats the difference when setting cudnn.enabled True or False?


(Trevor Standley) #7

Does it work to turn it on for training (where I have a constant input size) and turn it off for validation (where my input size isn’t constant)? Do I just set the constant before doing my validation?


(Alban D) #8

Yes, it will work to change the value!


#9

I have a question.

e.g. I have a net1 whose input sizes don’t vary. And I have other nn.Module named net_Loss whose input sizes vary. I only need to optimeze net1’s parameters. So should I use the cudnn.benchmark True or False?
Thank you !


(Salih Karagoz) #10

Hello,
IMHO, if your networks connect each other and if net1 is first network, you should use cudnn.benchmark = True.
and also you ve mentioned I only need net1’s parameters. I think you should use.


#11

Hello,
@albanD
When we set cudnn.benchmark=True

  1. How can l get access to the whole family of algorithms that potentially can be executed? (display them)

  2. How can l print the cudnn algorithm run at each iteration ?

Thank you


(Alban D) #12
  1. I’m afraid you can’t. That would depend on the operation you’re performing. For example, for forward pass of convolution, you can find the list here.
  2. You would have to add this print directly in the cpp code linked above.

(Diego) #13

Can you show an example on how to use this line? Can you place it anywhere in the code? Or before the forward pass?


(Ruotian(RT) Luo) #17

Can I have part of the inference to be benchmarked.

Like this:

x = func1(x)
x = x[:20]
torch.backends.cudnn.benchmark = True
x = func2(x)
torch.backends.cudnn.benchmark = False

(Pablo Rr100) #18

What if you are training different networks in the same script exposed to the same input sizes?
I am training a single ResNet44 and after that an ensemble of 3 ResNets 18 on CIFAR10.

How do I manage the cudnn.benchmark = True?
First at the beggining of the ResNet44 training and then once again after the beggining of the first ResNet18?

Thanks!