Profiling computations at the operation level

Mafuyu_Aizawa · December 16, 2021, 6:38am

I am trying to profile all the addition or multiplication computations completed at the very bottom operation level (two input operands and the result, for example, 2 + 3 = 5) during the inference of a model.

Is it possible to achieve this in python (such as the pytorch profiler) or I need to look into the C++ back end? (Will also need some hints if need to go deep into the C++)

Thank you!

hemeng · December 16, 2021, 8:29am

Based on my experience.

If you only want to calculate the execution time of some operation. Tensor Broad could help you. No matter your code is running on CPU or GPU.

If you want to dig into the GPU kernel level for more detailed information like SM utilization or memory workload, you can use Nsight Compute to profile your program. It can provide detailed information of GPU utilization.

Mafuyu_Aizawa · December 16, 2021, 8:50am

Thank you for your suggestions! Actually what I am trying to do is to replace the multiplications and additions in the convolutions with my custom operations (like myadd, mymult) for some purposes so I am trying to find a way to do so… (I only need to replace during forward pass though)

tom · December 16, 2021, 9:29am

The PyTorch profiler does exactly that for PyTorch functions and has good analysis options.

hemeng · December 16, 2021, 9:29am

Then you may get some help here.