in pytorch inductor source code, there is the var named extern_kernels, which stores kernels to be called, also there is torch.ops, which is monkey patched to include ops there are called. Here the op and kernel are roughly the same thing. Strictly speaking here the kernel is actually an op.
Some node are converted to ExternKernelSchedulerNode, and generate code like:
extern_kernels.addmm(primals_2, primals_3, reinterpret_tensor(primals_1, (2, 3), (1, 2), 0), alpha=1, beta=1, out=buf0)
While for ops, SchedulerNode is created, and generated code is like below:
torch.ops.aten.bernoulli_.float(buf2, 0.8)
My question is, why not do it in the same way? e.g., treat addmm and an op and generate code like
torchops.aten.addmm()
We know that implementation of extern_kernels.addmm is actually aten.addmm.
What is the point to have the similar function duplicated?