Triton Error [CUDA]: an illegal memory access was encountered

Cross-post from here with a follow-up.

Is that a comment or a request? What I put “there” is off topic for the codegen issue. After I posted there I decide this belonged as a new issue. All the info there is now here. Are you asking me to maje a note there that this discussion is now here? Sorry for being dense.

Could you post a minimal and executable code snippet to debug the issue

Simplifying 100 thousand lines of code I didn’t write isn’t easy.
I did figure out how to create a stand alone call to jit’ed kernel executed triton_mm matching the real one that crashes. But when I run it there isn’t a problem

I’ll see if cuda-gdb catches the error at the first point it finds it so I can get a stack.