I’m working in new NPU vendor (https://furiosa.ai) , and looking for the right way to integrate pytorch framework with our chip.
We’ve found torchdynamo (TorchDynamo and TorchInductor Tutorial — PyTorch Tutorials 1.13.0+cu117 documentation) and this way looks very promising to provide our accelerator via pytorch. We did some experiments to implement torchdynamo backend via ONNX export as our compiler stack have been used ONNX as a one of the major input format first. (fx.graph → onnx → our compiler codegen function)
Now, with a bit more better understanding of fx.graph, we’re further investigating feasible ways to benefit from powerful fx.graph to optimize graph before passing the unoptimized ONNX to our compiler. (e.g. we could split a graph into a subgraph which our compiler can accelerate, and the other so that simplifying compiler & runtime implementation)
One question from the work above is that “can we further optimize the graph splitting workflow in torchdynamo which previously our compiler have been done?”. We’ve found that early graph transformation for optimization looks similar with each other, and our compiler have been done on the optimization for onnx level. If we can do this in torchdynamo backend side with fx.graph, it’s more flexible and scalable we think.
In this way, we’ve encountered practical question about the implementation as our compiler is based on Rust lang and we want to transform the graph (whatever the graph is like fx.graph or onnx) in Rust side. I can understand fx.graph itself originally came from python program capturing so it may be a bit weird to think that we want to use it in the different language, but I want to leave this question as we’re not very familiar with pytorch ecosystem.
Is there any viable way to benefit from fx.graph from different language (or different framework, implementation, etc)? If not, what’s the right way to use for vendor who want to pre optimize graphs before compilation?
From my experience with fx, its a very pythonic thing, I would not try to manipulate it in another language. If you want to do something in another language, I would convert the fx to some simple graph of your own representation and then move to another language.
As for the path you are describing, of optimizing the graph before compilation in fx - My current experience is that dynamo export path is still not mature enough, and is still a work-in-progress. I found it hard to control the graph granularity, to export custom operations and split/merge graphs. Plain fx to onnx might be more workable at this point in time, but torch is moving to dynamo, so I am not sure how that will look in a few months.
fx is a stringly interface so there are APIs to manipulate it in python but there’s no hard reason to have to do so
Copying over some links you might find helpful
We have inference only. backends Getting Started — PyTorch master documentation
This is the core aten IR spec: IRs — PyTorch master documentation
This is how backends are registered in core today pytorch/torch/_dynamo/backends at master · pytorch/pytorch · GitHub (take a look at ONNX and IPEX one for inspiration). We are going trying to move away from having backends in core though
This is a fantastic notebook tutorial that shows you how to extract the aten IR and work with it Google Colab
Doc for how to register a custom backend Custom Backends — PyTorch master documentation
This is an example of an out of tree backend for the bladedisc compiler that was built without our involvement BladeDISC/pytorch_blade/torch_blade/dynamo at main · alibaba/BladeDISC · GitHub
Thanks for answers, guys.
From a bit more understanding about fx.graph, we’d built Rust bind for GraphModule itself. Although fx semantic is python specific and a bit hard to implement without implementation details, the fundamental parts of the fx.graph data structure itself is simple enough that we can extract from.
So in result, now we’re modifying fx.graph in Rust side leveraging our existent codes, but we’re still looking for best way to combine two different spaces. (to follow Python first principle in Torch) We’re planning to open our implementation soon, so stay connected.
That sounds wonderful, please share your work in this forum when you’re ready
Just be curious, how you convert fx GraphModule to ONNX? I’m looking for a tool or something that can handle this.