How to get the entire execution path of a PyTorch operator, from the Python level down to C++ or CUDA level, such as torch.matmul

I want to know the whole execution path of a pytorch op, in another word, the call stack. how can I do it.