Why Pytorch introduce Kernel(Processor) Mechanism?

I am using C++ API of pytorch and I am not familiar to python API. I have noticed the kernel mechanism of pytorch source codes. That is, to register processor for certain operation in run-time (before usage) as, for example, implemented in aten/src/ATen/core/boxing|dispatch|op_registration.

Why this mechanism is introduced? Would it be a common use case that the user pick one kernel(processor) from one library and pick the other kernel from another library, providing running in the save device? Is that to match requirements of some complex NN? Is that related to python API usage?

Thank you very much!


This can happen during regular NN training for example where you would do all the preprocessing on the CPU while the neural net itself would happen on a GPU (potentially some layers on TPU?).

@albanD. Thank you for the information.