What does torch.backends.cudnn.benchmark do?

John_Zhang · August 8, 2017, 4:21pm

whats the difference when setting it True or False?

albanD · August 8, 2017, 4:42pm

This flag allows you to enable the inbuilt cudnn auto-tuner to find the best algorithm to use for your hardware.

fmassa · August 8, 2017, 4:43pm

It enables benchmark mode in cudnn.
benchmark mode is good whenever your input sizes for your network do not vary. This way, cudnn will look for the optimal set of algorithms for that particular configuration (which takes some time). This usually leads to faster runtime.
But if your input sizes changes at each iteration, then cudnn will benchmark every time a new size appears, possibly leading to worse runtime performances.

yxchng · August 31, 2017, 1:47am

I tried using it with torchvision’s resnet101 but it gives worse performance. Is it normal? @fmassa @albanD

fmassa · September 1, 2017, 9:32am

It depends on the task. If your input size is changing a lot, then it might hurt runtime, if not, it should be much faster.

mjchen611 · September 9, 2017, 3:06am

whats the difference when setting cudnn.enabled True or False?

tstandley · February 17, 2018, 7:30am

Does it work to turn it on for training (where I have a constant input size) and turn it off for validation (where my input size isn’t constant)? Do I just set the constant before doing my validation?

albanD · February 19, 2018, 2:11pm

Yes, it will work to change the value!

Alpha · March 16, 2018, 8:01am

I have a question.

e.g. I have a net1 whose input sizes don’t vary. And I have other nn.Module named net_Loss whose input sizes vary. I only need to optimeze net1’s parameters. So should I use the cudnn.benchmark True or False?
Thank you !

salihkaragoz · April 20, 2018, 7:10am

Hello,
IMHO, if your networks connect each other and if net1 is first network, you should use cudnn.benchmark = True.
and also you ve mentioned I only need net1’s parameters. I think you should use.

DeepLearner17 · April 20, 2018, 11:36am

Hello,
@albanD
When we set cudnn.benchmark=True

How can l get access to the whole family of algorithms that potentially can be executed? (display them)
How can l print the cudnn algorithm run at each iteration ?

Thank you

albanD · April 20, 2018, 12:45pm

I’m afraid you can’t. That would depend on the operation you’re performing. For example, for forward pass of convolution, you can find the list here.
You would have to add this print directly in the cpp code linked above.

Diego · May 8, 2018, 3:37pm

Can you show an example on how to use this line? Can you place it anywhere in the code? Or before the forward pass?

ruotianluo · July 6, 2018, 9:24pm

Can I have part of the inference to be benchmarked.

Like this:

x = func1(x)
x = x[:20]
torch.backends.cudnn.benchmark = True
x = func2(x)
torch.backends.cudnn.benchmark = False

PabloRR100 · October 4, 2018, 4:01pm

What if you are training different networks in the same script exposed to the same input sizes?
I am training a single ResNet44 and after that an ensemble of 3 ResNets 18 on CIFAR10.

How do I manage the cudnn.benchmark = True?
First at the beggining of the ResNet44 training and then once again after the beggining of the first ResNet18?

Thanks!

kuzand · January 17, 2019, 10:08am

Hi. Can you please clarify what do you mean by “input size”? Is it the image size, like 224? Since the input size of network is always fixed and the images are resized to the same size before inputing them to network, in which cases it can vary?

fabianjul · March 15, 2019, 1:04pm

When you have a fully convolutional network

Oli · April 29, 2019, 6:33pm

I find that torch.backends.cudnn.benchmark increases the speed for my YOLOv3 model by a lot, like 30-40%. Furthermore, it lowers the memory footprint after it completes the benchmark.

It even works when my input images vary in size between each batch, neat! I was thinking about having the network optimize on a few smaller torch.randn(...) to benchmark on, and then start the training. I hope that this could allow me to increase the batch size since the memory footprint is lower after the bechmark. What do you guys thing?

Shubhankar · January 13, 2020, 5:55pm

Is this still the only way to do it?

albanD · January 13, 2020, 6:02pm

Hi,

I don’t think this has changed I’m afraid. But there might have been changes to this code so not sure.