Hi folks,
I am testing the performance of dynamo+inductor on a pyg model and found that there are many graph breaks that greatly decrease the performance.
I turned _dynamo.config.log_level
to DEBUG, and found that most graph-breaks’ GraphCompileReason
is return_value
.
Such as
GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /home/X/trienv/lib/python3.8/site-packages/torch_geometric/nn/conv/arma_conv.py, line 143 in message>])
and the corresponding code line is
return edge_weight.view(-1, 1) * x_j # this line
It makes sense for a function call to trigger a frame thus resulting in a graph break. But what puzzles me is that there are many function calls containing RETURN_VALUE
that are inlined without graph break as the log shows, such as
[2023-03-14 14:23:10,376] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object scatter_add at 0x7f7e2d852660, file "/home/X/trienv/lib/python3.8/site-packages/torch_scatter-2.1.0-py3.8-linux-x86_64.egg/torch_scatter/scatter.py", line 26>
29 0 LOAD_GLOBAL 0 (scatter_sum)
2 LOAD_FAST 0 (src)
4 LOAD_FAST 1 (index)
6 LOAD_FAST 2 (dim)
8 LOAD_FAST 3 (out)
10 LOAD_FAST 4 (dim_size)
12 CALL_FUNCTION 5
14 RETURN_VALUE
I have two questions:
- Is Dynamo inline the function calls greedily, and only break the graph if it is not possible?
- Is there any principle or guideline for python code writing to make a function with
RETURN_VALUE
inlined and avoid a graph break in Dynamo?