"turing_fp16_s1688gemm_fp16_128x128_ldg8_relu_f2f_tn"

What exactly this kernel does ? And why this kernel got launched during QKV projection? Are there some internal working which actually deactivate this ReLu parameter of this kernel for matrix multiplication?

FYI: It is launched by CPU kernel : aten::addmm

This is a matmul kernel for float16 data used in mixed-precision workloads, most likely called by cuBLAS.

Okay but why there is this ‘relu’ ? there is no need of relu activation function in linear projection right??

The kernel has the ability to use a relu in the epilogue but the name itself does not mean it’s necessarily used. You can check its output and would know for sure if it’s used or not.

Thanks, Can you tell me how exactly I can check the output of this kernel. Because in the trace json file, I can see the output dimension. It is doing only the projection for Q,K and V matrices by multiplying the weight matrix with the Input matrix.

Isolate the matmul call which uses this kernel e.g. via profiling your code with Nsight Systems with stacktraces, then print the matmul output.