What algorithm does pytorch use if torch.backends.cudnn.benchmark = False
is set.
@albanD Btw when I set torch.backends.cudnn.benchmark = False
I still get 3 calls to FFT algorithm (1 of the 4 calls) as shown in the following snippet from CUDNN logs which are a lot fewer (vs. 39) than using torch.backends.cudnn.benchmark = True
. What could be causing it?
I! CuDNN (v7601) function cudnnConvolutionForward() called:
i! handle: type=cudnnHandle_t; streamId=(nil) (defaultStream);
i! alpha: type=CUDNN_DATA_FLOAT; val=1.000000;
i! xDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[18432,6,14,14];
i! strideA: type=int; val=[1176,196,14,1];
i! xData: location=dev; addr=0x200154c00000;
i! wDesc: type=cudnnFilterDescriptor_t:
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i! vect: type=int; val=0;
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[16,6,5,5];
i! format: type=cudnnTensorFormat_t; val=CUDNN_TENSOR_NCHW (0);
i! wData: location=dev; addr=0x200118400a00;
i! convDesc: type=cudnnConvolutionDescriptor_t:
i! mode: type=cudnnConvolutionMode_t; val=CUDNN_CROSS_CORRELATION (1);
i! dataType: type=cudnnDataType_t; val=CUDNN_DATA_FLOAT (0);
i! mathType: type=cudnnMathType_t; val=CUDNN_DEFAULT_MATH (0);
i! reorderType: type=int; val=0;
i! arrayLength: type=int; val=2;
i! padA: type=int; val=[0,0];
i! strideA: type=int; val=[1,1];
i! dilationA: type=int; val=[1,1];
i! groupCount: type=int; val=1;
i! algo: type=cudnnConvolutionFwdAlgo_t; val=CUDNN_CONVOLUTION_FWD_ALGO_FFT (4);
i! workSpace: location=dev; addr=0x2001c0000000;
i! workSpaceSizeInBytes: type=size_t; val=679486848;
i! beta: type=CUDNN_DATA_FLOAT; val=0.000000;
i! yDesc: type=cudnnTensorDescriptor_t:
i! dataType: type=cudnnDataType_t; val=**CUDNN_DATA_FLOAT (0)**;
i! nbDims: type=int; val=4;
i! dimA: type=int; val=[18432,16,10,10];
i! strideA: type=int; val=[1600,100,10,1];
i! yData: location=dev; addr=0x2000a3800000;
i! Time: 2020-01-16T12:23:07.839722 (0d+0h+0m+25s since start)
i! Process=86375; Thread=86375; GPU=0; Handle=0x13e4302b0; StreamId=(nil) (defaultStream).
I am using pytorch doc: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#training-an-image-classifier with and without torch.backends.cudnn.benchmark
and with
export CUDNN_LOGINFO_DBG=1
export CUDNN_LOGDEST_DBG=log.txt
cuDNN can still be used, if torch.backends.cudnn.enabled = True
.
If you don’t want to use cudnn, you should set this flag to False
to use the native PyTorch methods.
When cudnn.benchmark
is set to True
, the first iterations will get a slowdown, as some internal benchmarking is done to get the fastest kernels for your current workload, which would explain the additional function calls you are seeing.
I get that but what I observed was CUDNN logs still had some CUDNN algorithm calls even though I had used torch.backends.cudnn.benchmark=False
like as shown What algorithm does pytorch use without the use of backends.cudnn?