Why is GraphFuser a NonDiffOptimization?

X.Guo · September 5, 2019, 2:52pm

In graph_executor.cpp,
If the graph to be optimized needs Gradient, runNondiffOptimization(gradient.f) will be called.

runNondiffOptimization() runs some optimizations including BatchMM and FuseGraph.
I wonder why are these optimization non-differentiable?

Take the FuseGraph as an example. It fuses continuous point-wise ops.
If it can fuse f(g(x)), why can’t it fuse df(g(x)) * dg(x).

Or it’s simply because the current implementation of FuseGraph does not support this?