Question on CustomFuseGraph (graph_fuser). pytorch/tvm , pytorch/glow

Hi all,
I have some basic questions on CustomFuseGraph and its relation to other projects. I see that pytorch/tvm and pytorch/glow are using different custom fuse passes and not using the CustomFuseGraph. Could someone explain why they do not use the graph_fuser provided by PyTorch? Sorry if this is a naive question.


Traditionally, the graph fuser in PyTorch has been operating on a per-Block basis. This is inherited (at least it used to a few weeks ago) by the CustomFuseGraph.
PyTorch TVM (and probably also glow) are interested in fusing across blocks, e.g. to also fuse control flow nodes. This is why PyTorch TVM has switched away from CustomFuseGraph. Personally, I would expect that custom fusion mechanism will be folded back into PyTorch when the picture of what users need there becomes clearer (just like CustomFuseGraph was introduced when one thought that one could also make good use of that, I think I recall that originally we thought that the CustomFuseGraph might already work for PyTorch TVM).

Best regards


1 Like

@tom: Awesome. Thanks for taking the time and responding. I think that answers my question to a great extent. So, is the plan to upstream these changes to the CustomFuseGraph? Also, the torch/tvm’s custom fusion seems to ignore the control flow ops for now seen in fusion_pass.cpp.

I was also experimenting a bit and see that fusing blocks of control flow ops is not currently handled in subgraph_utils, is this understanding right ?

Lets take this graph for example

graph(%x.1 : Float(*)):
  %18 : Float(1) = prim::Constant[value={1}]()
  %16 : int[] = prim::Constant[value=[1]]()
  %1 : None = prim::Constant()
  %2 : int = prim::Constant[value=1]() #
  %3 : int = prim::Constant[value=2]() #
  %4 : int = prim::Constant[value=3]() #
  %ret.1 : Double(*) = aten::zeros(%16, %1, %1, %1, %1) #
  %7 : Bool(*) = aten::eq(%x.1, %2) #
  %8 : bool = aten::Bool(%7) #
  %ret : Tensor(*) = prim::If(%8) #
      %12 : Tensor = aten::add_(%ret.1, %18, %2) #
      %ret.4 : Double(*) = aten::mul(%ret.1, %3) #
      %ret.7 : Double(*) = aten::add(%ret.4, %18, %2) #
      -> (%ret.7)
      %ret.9 : Float(*) = aten::add(%x.1, %4, %2) #
      -> (%ret.9)
  return (%ret)

Here, the first node aten::add_() inside block0() of prim::If has one of its inputs %ret.1 which is coming from outside the prim::If. But when cloning this node, during mergeNodeIntoSubgraph its not able to find the metadata of this input node. I think its because when cloning, the value_map only has the inputs in block scope and not the graph scope. I am not very familiar with the PT graph manipulation to understand the reason why only block’s inputs are added to value_map in ir.cpp and only the prim::If’s inputs are added to the value map in subgraph_utils . The error seems to come because the value_map(i) in ir.cpp returns a NULL as the ret.1 is not in the value_map.

Is there an example in code where fusion of control flow ops is handled? Also, please correct me if my understanding is wrong.

So I don’t know how to answer most of your questions and I haven’t looked in a while relative to the rate of change. Apparently, something is not quite there yet for the control flow in torch TVM.
For the fusion in the fusion pass, blocks is not currently handled, and I am ignorant of whether the obstacle torch TVM hit was with the graph rearrangement itself or something else.
to the best of my knowledge we don’t currently fuse inplace ops, so I would expect adventure when you try that. Maybe that is part of the problem you are seeing.

I don’t know any examples beyond the obvious users. In theory, I would expect the optimizations removing ifs with constant true/false result to give some idea of what it would take to move ops with blocks into a subgraph.

I’ll be very interested to hear from your progress.

Best regards


Yep, your understanding is correct. Control flow is currently unhandled.

@bwasti: Thanks for your response. Could you let me know if my understanding above is correct? I just want to understand if fusion of control flow ops into the custom op node is even allowed by PT.