"turing_fp16_s1688gemm_fp16_128x128_ldg8_relu_f2f_tn"

JB13 · April 17, 2025, 4:50pm

What exactly this kernel does ? And why this kernel got launched during QKV projection? Are there some internal working which actually deactivate this ReLu parameter of this kernel for matrix multiplication?

FYI: It is launched by CPU kernel : aten::addmm

ptrblck · April 17, 2025, 4:53pm

This is a matmul kernel for float16 data used in mixed-precision workloads, most likely called by cuBLAS.

JB13 · April 17, 2025, 5:06pm

Okay but why there is this ‘relu’ ? there is no need of relu activation function in linear projection right??

ptrblck · April 17, 2025, 5:31pm

The kernel has the ability to use a relu in the epilogue but the name itself does not mean it’s necessarily used. You can check its output and would know for sure if it’s used or not.

JB13 · April 21, 2025, 5:04am

Thanks, Can you tell me how exactly I can check the output of this kernel. Because in the trace json file, I can see the output dimension. It is doing only the projection for Q,K and V matrices by multiplying the weight matrix with the Input matrix.

ptrblck · April 21, 2025, 12:41pm

Isolate the matmul call which uses this kernel e.g. via profiling your code with Nsight Systems with stacktraces, then print the matmul output.