How to use pytorch profiler record_function for c++ code

I want to add some datapoints for custom c++ code I have for training. I know I can use profile record_function to add to profiler from python code. But my c++ op is pretty heavy and I want to add some more fine grain output from c++ side. I was able to track down the impl of record_function and found those c++ impl code:

But when I called record_function_enter_new and record_function_exit_new from my c++ code, it didn’t show up in final profiling result? I cannot find any useful examples in code base for how to call those functions. Any pointers are appreciated.