I posted a series of questions on the forum today. I’ve listed them in hope that putting them together would better shed light on what I do know and don’t know.
- In Dynamo+AOTAutograd, why run Faketensor through the code multiple times?
- In resolving DispatchKeySet, is PythonDispatch called first or last? (this post)
- Better understanding why AOTAutograd decomposes
fused_rms_norm_backward
for CUDA, but not for Meta tensors
When introducing __torch_dispatch__
or TorchDispatchMode
, some posts seem to suggest that the PythonDispatch is being called after features such as Autograd, etc. right before the actual function is being called. For example, this dev note seems to suggest that Python
dispatcher comes after Autograd
, Autocast
, etc.
However, if I look at DispatchKey.h
, the PythonDispatch
key is set at enum/bit # 63. The implementation looks at the most significant bit, which makes me think PythonDispatch
has the highest priority.
Am I misunderstanding something here, or is the dev note ‘outdated’?