Doing performance profiling wit nvprof for the PyTorch model with one convolution layer.
Setup:
For simplicity, the model consists of one Conv1d + Relu layer.
Input size: [32, 64, 1664] kernel_size: 11
Enabled cudnn.benchmark = True
Pre-warmed up.
In the profiler, I see that the volta_fp16_s884cudnn_fp16_256x128_ldg8_relu_f2f_exp_small_nhwc2nchw_tn_v1 function takes most of the time.
Could you tell me, please?
What is the purpose of this function?
Why are there no references to convolutions in the name of the function?
The results are correct. I was just having trouble figuring out if this kernel is a convolution or not since it doesn’t have gemm/fft/winograd/inplace in the function name (other kernels would contain h884gemm, then it’s clear). Is it a conv implementation? What do exp/relu mean? In the trace threshold (relu) op is a separate one.