The process of a certain operation is actually executed by calling *KernelFunction.callxxx*. Take *callUnboxedOnly* for example, the sentence that invokes the operation is:

```
return (*func)(getFunctor_(), std::forward<Args>(args)...);
```

The *func* is *wrap_kernel_functor_unboxed_::call*, *getFunctor_(*) returns the functor that executes the real operations (e.g. lstm cell).

I am wondering why the functor has to be wrapped by another static function (*wrap_kernel_functor_unboxed_::call*)? How if we move the codes in *wrap_kernel_functor_unboxed_::call* into *KernelFunction.callUnboxedOnly* so we can execute the functor directly? Why we need a *functor_* then *unboxed_kernel_func_* in *KernelFunction*?

Thank you very much!

- When the KernelFunction has been stored in registry, it lost its real type. To users, it is just some KernelFunction and OperatorKernel pointers.
- When the call() function invoked, there are just input and return arguments specialized. As it is natural for user (e.g. NN Model) to set these parameters, while it is unreasonable for user to provide template parameters, which will make the extensibility and flexibility of the framework unfeasible.
- If we make the facade as, for example, a simple OperatorKernel.operator()(args) and make every child class of OperatorKernel to implements these operator() function, the number of the permutation of input types and the conflicts of return type for same input types make the strategy almost impossible.
- So we wrapped the OperatorKernel again and provided a static function to make the operations in a uniform format without consideration of inheritance tree
- Besides operator input arguments, the call function also receive a functor argument to take advantage of functor/lambda properties. (Not sure if it is best solution)
- For boxed/unboxed functor, the specialized wrap_kernel_functor_(un)boxed struct holds the functor signature as by parameter KernelFunctor, and the OperatorKernel input argument is static_cast-ed into KernelFunctor in static call function.
- The call function also manipulates stack for boxed kernel.

Some ideas. I’ve not convinced myself completely, just put here for discussion.