I have read some introductions about torch dynamo. It can emit multiple sub-graphs (graph breaks) and one graph without any breaks.
I am curious about why it still produces multiple sub-graphs if it can generate the entire graph. What would be the sacrifice if we choose not to have any graph breaks?
Is it possible to explain it in more detail using the following example?
if x.shape > 10:
Graph breaks mean performance hit so typically you want as few of them as possible, in your example you can’t produce a full graph because of control flow on the shape of a tensor
You can workaround this if your compiler supports dynamic shapes which
torch.compile(..., dynamic=True) will let you do but the perf speedups you’ll get will be less drastic than making your model not dynamic
Also a great tool to understand graph breaks in your model is