What exactly this kernel does ? And why this kernel got launched during QKV projection? Are there some internal working which actually deactivate this ReLu parameter of this kernel for matrix multiplication?
The kernel has the ability to use a relu in the epilogue but the name itself does not mean it’s necessarily used. You can check its output and would know for sure if it’s used or not.
Thanks, Can you tell me how exactly I can check the output of this kernel. Because in the trace json file, I can see the output dimension. It is doing only the projection for Q,K and V matrices by multiplying the weight matrix with the Input matrix.