Hey @Jerome_Ku. The short answer is:
We have a handful of operator decompositions that always run by-default, inside of the dispatcher (in C++), before making it into __torch_dispatch__
.
linear/matmul is by far the most common
-
aten.linear
decomposes intoaten.matmul
+aten.transpose
here: pytorch/aten/src/ATen/native/Linear.cpp at main · pytorch/pytorch · GitHub -
aten.matmul
decomposes into a few ops (eitheraten.mm
,aten.baddbmm
or a few others) here: pytorch/aten/src/ATen/native/LinearAlgebra.cpp at main · pytorch/pytorch · GitHub
If you’re wondering why, the historical answer is that there are a number of ops that we don’t have dedicated derivative formulas for (e.g. linear), and so rather than writing a brand new formula, we just have the autograd engine decompose the op into more primitive ops that it does have formulas for (e.g. aten.mm
and transpose).
If you are interested in export and only care about inference, we actually recently made it so that exporting for inference can preserve all ATen ops, including these special ops like linear, in the graph:
m = torch.nn.Linear(...)
graph_module = torch.export.export(m, args).run_decompositions().module()
# you should see aten.linear, as long as you didn't manually specify that you want it decomposed
print(graph_module)