What does torch.backends.cudnn.benchmark do?

Is it possible to interact with CUDNN API from Pytorch. Following function returns the type of algorithm to be used defined by CUDNN_CONVOLUTION_FWD_PREFER_FASTEST:

cudnnGetConvolutionForwardAlgorithm(cudnn,
                                        input_descriptor,
                                        kernel_descriptor,
                                        convolution_descriptor,
                                        output_descriptor,
                                        CUDNN_CONVOLUTION_FWD_PREFER_FASTEST,
                                        /*memoryLimitInBytes=*/0,
                                        &convolution_algorithm)

Or is it possible to use CUDNN logs somehow?

cc @ptrblck who is more experimented with cudnn :slight_smile:

FYI This issue is still open on github

Should this be enabled:-

  • During training?
  • During inference? or
  • Or both.
    I care only about inference speed.

Thankyou

If your input always have the same size, you should enable it all the time.
But it will influence which algorithm cudnn is using only while the flag is enabled. So setting it during training does not influence inference in any way.

@fmassa @ptrblck
Hello. I would like to ask a few questions about the behavior of torch.backends.cudnn.benchmark = True.

  1. Does the mini-batch size matter? Many people say that benchmarking uses the same cache if image input size is the same. However, I have not found a clear explanation of whether changing batch size is OK.
  2. How many caches can it manage? For example, I might have two types of input: 224x224 and 320x320. Would changing between the two types of images constantly require additional benchmarking or would there be two separate caches?

Thank you in advance for your replies!

1 Like
  1. Yes, the batch size matters, as the ConvolutionParams will be stored from here.

  2. It’s using a std::unordered_map with the mentioned ConvolutionParams, so no additional benchmarking would be required for these two shapes once they are already profiled.

2 Likes

hello guys, I have a quick question about the torch.backends.cudnn.benchmark = True

When you say the input_size cannot change, does that apply to each convolution layer?

I have a UNet design using dense blocks. Since in a block, input for each layer is different, does that mean I cannot use torch.backends.cudnn.benchmark = True ?
Is there any workaround for dense block so that I can use torch.backends.cudnn.benchmark = True ?

Thanks in advance :slight_smile:

The input shape can change, but each new input shape will rerun cudnnFind to find the fastest kernel for this shape (for all layers with a new input shape) and will add these kernels to a cache.

No, you can use it, but each new input shape will cause a slowdown once.

1 Like

How can we use in c++?

at::globalContext().setBenchmarkCuDNN(true); should work.

4 Likes

what is the default value?

By default benchmark is set to False.

1 Like