I’m doing a project that focuses on runtime recompilation for approximate computing and I have a proof-of-concept for C++ front-ends using LLVMs JIT compilation. I’m looking into expanding my approach to target object recognition frameworks and I’ve come across lots of frameworks that are built around Pytorch.
In short, I’m looking into a way to, from an arbitrary application at runtime, recompile Pytorch operators with different levels of optimization. I’m looking into ways to bind my framework into these operators but I’m finding the structure of Pytorch’s dispatch pretty complex. This excellent blog post clarified some things for me, but I’m still having trouble understanding where the dispatch routines are located on the source.
Given this background, my questions are:
Given an arbitrary operator (i.e. Linear), what is the path in the source code from the front-end Python API up to its kernel call in C++?
Is it doable to hack the source and simplify the dispatch structure (at C++ level) if I’m only interested in a specific device for a proof-of-concept? i.e. always jumping directly to the CPU version of
The reason for the second question is that my proof-of-concept requires executing directly from LLVM IR, which is trivial to produce from C++ source, but I have no idea how to target Python’s front-end to replicate the same conversion.
Any help would be appreciated, especially in deciphering the structure of Pytorch’s dispatch so I know where to start.
Thanks in advance!