In graph_executor.cpp,
If the graph to be optimized needs Gradient, runNondiffOptimization(gradient.f)
will be called.
runNondiffOptimization() runs some optimizations including BatchMM and FuseGraph.
I wonder why are these optimization non-differentiable?
Take the FuseGraph
as an example. It fuses continuous point-wise ops.
If it can fuse f(g(x))
, why can’t it fuse df(g(x)) * dg(x)
.
Or it’s simply because the current implementation of FuseGraph does not support this?