I posted a series of questions on the forum today. I’ve listed them in hope that putting them together would better shed light on what I do know and don’t know.
- In Dynamo+AOTAutograd, why run Faketensor through the code multiple times?
- In resolving DispatchKeySet, is PythonDispatch called first or last? (this post)
- Better understanding why AOTAutograd decomposes
fused_rms_norm_backwardfor CUDA, but not for Meta tensors
When introducing __torch_dispatch__ or TorchDispatchMode, some posts seem to suggest that the PythonDispatch is being called after features such as Autograd, etc. right before the actual function is being called. For example, this dev note seems to suggest that Python dispatcher comes after Autograd, Autocast, etc.
However, if I look at DispatchKey.h, the PythonDispatch key is set at enum/bit # 63. The implementation looks at the most significant bit, which makes me think PythonDispatch has the highest priority.
Am I misunderstanding something here, or is the dev note ‘outdated’?