Is there a way to output some form of the compiled cuda code that gets run in pytorch or some representation of the graph in human readable form? Preferably without a rebuild of the pytorch.
Thanks!
Is there a way to output some form of the compiled cuda code that gets run in pytorch or some representation of the graph in human readable form? Preferably without a rebuild of the pytorch.
Thanks!
Do you mean from JIT? PyTorch w/o JIT runs everything as a dynamic graph so there isn’t compilation.
Gotcha, is there a way to log Cuda kernel launches or something like that?
The profiler does something similar to that, but at op level not at kernel level