I am testing the performance of dynamo+inductor on a pyg model and found that there are many graph breaks that greatly decrease the performance.
_dynamo.config.log_level to DEBUG, and found that most graph-breaks’
GraphCompileReason(reason='return_value', user_stack=[<FrameSummary file /home/X/trienv/lib/python3.8/site-packages/torch_geometric/nn/conv/arma_conv.py, line 143 in message>])
and the corresponding code line is
return edge_weight.view(-1, 1) * x_j # this line
It makes sense for a function call to trigger a frame thus resulting in a graph break. But what puzzles me is that there are many function calls containing
RETURN_VALUE that are inlined without graph break as the log shows, such as
[2023-03-14 14:23:10,376] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object scatter_add at 0x7f7e2d852660, file "/home/X/trienv/lib/python3.8/site-packages/torch_scatter-2.1.0-py3.8-linux-x86_64.egg/torch_scatter/scatter.py", line 26> 29 0 LOAD_GLOBAL 0 (scatter_sum) 2 LOAD_FAST 0 (src) 4 LOAD_FAST 1 (index) 6 LOAD_FAST 2 (dim) 8 LOAD_FAST 3 (out) 10 LOAD_FAST 4 (dim_size) 12 CALL_FUNCTION 5 14 RETURN_VALUE
I have two questions:
- Is Dynamo inline the function calls greedily, and only break the graph if it is not possible?
- Is there any principle or guideline for python code writing to make a function with
RETURN_VALUEinlined and avoid a graph break in Dynamo?