Output AST of code run in cuda?

Is there a way to output some form of the compiled cuda code that gets run in pytorch or some representation of the graph in human readable form? Preferably without a rebuild of the pytorch.


Do you mean from JIT? PyTorch w/o JIT runs everything as a dynamic graph so there isn’t compilation.

Gotcha, is there a way to log Cuda kernel launches or something like that?

The profiler does something similar to that, but at op level not at kernel level