Question about graph break related to RETURN_VALUE and INLINE function call

Hi folks,

I am testing the performance of dynamo+inductor on a pyg model and found that there are many graph breaks that greatly decrease the performance.
I turned _dynamo.config.log_level to DEBUG, and found that most graph-breaks’ GraphCompileReason is return_value.
Such as

GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /home/X/trienv/lib/python3.8/site-packages/torch_geometric/nn/conv/, line 143 in message>])

and the corresponding code line is

        return edge_weight.view(-1, 1) * x_j # this line

It makes sense for a function call to trigger a frame thus resulting in a graph break. But what puzzles me is that there are many function calls containing RETURN_VALUE that are inlined without graph break as the log shows, such as

[2023-03-14 14:23:10,376] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object scatter_add at 0x7f7e2d852660, file "/home/X/trienv/lib/python3.8/site-packages/torch_scatter-2.1.0-py3.8-linux-x86_64.egg/torch_scatter/", line 26> 
  29           0 LOAD_GLOBAL              0 (scatter_sum)
              2 LOAD_FAST                0 (src)
              4 LOAD_FAST                1 (index)
              6 LOAD_FAST                2 (dim)
              8 LOAD_FAST                3 (out)
             10 LOAD_FAST                4 (dim_size)
             12 CALL_FUNCTION            5
             14 RETURN_VALUE

I have two questions:

  1. Is Dynamo inline the function calls greedily, and only break the graph if it is not possible?
  2. Is there any principle or guideline for python code writing to make a function with RETURN_VALUE inlined and avoid a graph break in Dynamo?
1 Like

After carefully reading the log and I found that the return_value GraphCompileReason comes from a normal compilation of function frames. As for the failure of the function inlining, I discovered that it is due to some inline-failure function calls.
For example, if a function fails in inlining, then the whole call stack related to this function will refuse to inline, this makes the log not entirely intuitive.

To avoid graph break, after diagnosing several cases in GNN scenario, I personally think the core factor is to avoid comparison on tensor value, especially on some frequently used helper functions, that will always result in graph fragments.