Hello,
I observed that in:
the accuracy threshold of that test is rtol=5e-2, atol=0.07. It is not clear to me why this is an acceptable accuracy threshold.
- Why can that threshold not be optimized for better accuracy?
- Is there a similar test on CPU or other platform that could provide some insight as to what thresholds are used for other hardware?
This question is related to ongoing investigations on kernels used for vLLM software: Add padding support to wvSplitK solution for skinny GEMMs by amd-hhashemi · Pull Request #33762 · vllm-project/vllm · GitHub
