Triton Error [CUDA]: an illegal memory access was encountered

Cross-post from here with a follow-up.
Could you post a minimal and executable code snippet to debug the issue or grab the stacktrace from cuda-gdb, please?