How to use RECORD_FUNCTION macro

For measure performance data of a customized C++ function, should RECORD_FUNCTION macro be used or follow steps in Dispatcher::callWithDispatchKeySlowPath function?

  1. It looks like it is better to use at::shouldRunRecordFunction(&pre_sampled) to determine whether to add a RecordFunction into execution to reduce overhead in non-profiling scenario. Is there any reason at::shouldRunRecordFunction(&pre_sampled) is not invoked in the macro?
  2. pre_sampled parameter seems to be not working in RECORD_FUNCTION. Will pre_sampled bring benefits to performance?
  3. How RECORD_FUNCTION macro deal with op nesting invocation scenario?