Nsight-compute profiling for torch?

Hi, I am a newbie here. Sorry if the post is unrelated to PyTorch or something… I just want to profile hugging face transformers mainly written in torch using nsight-compute (ncu). Is there any one who tried to do so? I know there is torch.cuda.profiler but I want to measure occupancy of the GPU instead of the percentage of the execution time of each kernel.

Thanks~