Mysterious CUDNN kernel


Doing performance profiling wit nvprof for the PyTorch model with one convolution layer.

For simplicity, the model consists of one Conv1d + Relu layer.
Input size: [32, 64, 1664] kernel_size: 11
Enabled cudnn.benchmark = True
Pre-warmed up.

In the profiler, I see that the volta_fp16_s884cudnn_fp16_256x128_ldg8_relu_f2f_exp_small_nhwc2nchw_tn_v1 function takes most of the time.

Could you tell me, please?
What is the purpose of this function?
Why are there no references to convolutions in the name of the function?

Link to json file with nvprof tracing info profile_one_layer_pytorch.sqlite.json · GitHub
File can be opened in chrome://tracing

@ptrblck would you have an advice? what does exp and tn mean? is it some conv implementation?

The tn could declare the transposes as described here.

@11245 is there any particular issue with this kernel, i.e. are you seeing a slowdown or invalid results?

The results are correct. I was just having trouble figuring out if this kernel is a convolution or not since it doesn’t have gemm/fft/winograd/inplace in the function name (other kernels would contain h884gemm, then it’s clear). Is it a conv implementation? What do exp/relu mean? In the trace threshold (relu) op is a separate one.

@ptrblck It is indeed a gemm kernel, right?

It would be nice to have somewhere a summary table explaining the kernels and their naming suffixes scheme.

This might be a good feature/documentation request for the cudnn team, so feel free to create this request on the cudnn bug tracker.

1 Like