Different API will result in different group such as FusionGroup and CudaFusionGroup. Why? And what is different?
These are actually different fuser generations.
The “classic” 1st-gen fuser only did pointwise ops and created
FusionGroup nodes. The newer fuser developed by a team at NVIDIA creates
CUDAFusionGroup. To round off the trio, there is
TensorExprGroup nodes created by the TensorExpr/NNC fuser developed by a team at FB. The latter two also support some reductions.
A while ago, I wrote a blog on the various fusers.
Thanks for your reply. I read your blog first. It seems that the mechanism is not easy to figure out.
The JIT optimization steps probably are among the most sophisticated bits in PyTorch (along with the dispatcher,…). For a deep dive on one of the fusers, I can also enthusiastically recommend Christian Sarofeen’s talk (I think you need to register to see it).
Excellent! I will look into it and will email you when encounter any question. Much thanks.