whats the difference when setting it True or False?
This flag allows you to enable the inbuilt cudnn auto-tuner to find the best algorithm to use for your hardware.
It enables benchmark mode in cudnn.
benchmark mode is good whenever your input sizes for your network do not vary. This way, cudnn will look for the optimal set of algorithms for that particular configuration (which takes some time). This usually leads to faster runtime.
But if your input sizes changes at each iteration, then cudnn will benchmark every time a new size appears, possibly leading to worse runtime performances.
It depends on the task. If your input size is changing a lot, then it might hurt runtime, if not, it should be much faster.
whats the difference when setting cudnn.enabled True or False?
Does it work to turn it on for training (where I have a constant input size) and turn it off for validation (where my input size isn’t constant)? Do I just set the constant before doing my validation?
Yes, it will work to change the value!
I have a question.
e.g. I have a net1 whose input sizes don’t vary. And I have other nn.Module named net_Loss whose input sizes vary. I only need to optimeze net1’s parameters. So should I use the cudnn.benchmark True or False?
Thank you !
IMHO, if your networks connect each other and if net1 is first network, you should use cudnn.benchmark = True.
and also you ve mentioned I only need net1’s parameters. I think you should use.
When we set cudnn.benchmark=True
How can l get access to the whole family of algorithms that potentially can be executed? (display them)
How can l print the cudnn algorithm run at each iteration ?
- I’m afraid you can’t. That would depend on the operation you’re performing. For example, for forward pass of convolution, you can find the list here.
- You would have to add this print directly in the cpp code linked above.
Can you show an example on how to use this line? Can you place it anywhere in the code? Or before the forward pass?
Can I have part of the inference to be benchmarked.
x = func1(x) x = x[:20] torch.backends.cudnn.benchmark = True x = func2(x) torch.backends.cudnn.benchmark = False
What if you are training different networks in the same script exposed to the same input sizes?
I am training a single ResNet44 and after that an ensemble of 3 ResNets 18 on CIFAR10.
How do I manage the
cudnn.benchmark = True?
First at the beggining of the ResNet44 training and then once again after the beggining of the first ResNet18?
Hi. Can you please clarify what do you mean by “input size”? Is it the image size, like 224? Since the input size of network is always fixed and the images are resized to the same size before inputing them to network, in which cases it can vary?
When you have a fully convolutional network
I find that
torch.backends.cudnn.benchmark increases the speed for my YOLOv3 model by a lot, like 30-40%. Furthermore, it lowers the memory footprint after it completes the benchmark.
It even works when my input images vary in size between each batch, neat! I was thinking about having the network optimize on a few smaller
torch.randn(...) to benchmark on, and then start the training. I hope that this could allow me to increase the batch size since the memory footprint is lower after the bechmark. What do you guys thing?
Is this still the only way to do it?
I don’t think this has changed I’m afraid. But there might have been changes to this code so not sure.