How to get CUDA Kernels?

Hello everyone,
I am doing some research on Deep Learning performance on GPU. I want to get the CUDA Kernels that are executed on GPU. So I added these lines to my training file :

os.environ[“TORCH_DISTRIBUTED_DEBUG”] = “DETAIL”
os.environ[“TORCH_CPP_LOG_LEVEL” ] = “INFO”

but I get nothing in log !
Do you have any suggestions?
Thanks !

Hi, @abderrahim_elhazmi , I don’t have suggestions regarding your question, but I’m also looking for the same currently - mainly to get the kernels executed on GPU.

Could you please let me know the source from where did you found the two lines ? And does setting the two env variables give us logs on the Kernels being executed on GPU ?

Thanks a lot !

You could use a profiler such as Nsight Systems, which will show the called CUDA kernels.

2 Likes

Hey @Vinayaka_Hegde, I found them in this discussion
Performance Impact of TORCH_CPP_LOG_LEVEL=INFO and TORCH_DISTRIBUTED_DEBUG=DETAIL - C++ - PyTorch Forums