I am doing some research on Deep Learning performance on GPU. I want to get the CUDA Kernels that are executed on GPU. So I added these lines to my training file :
os.environ[“TORCH_DISTRIBUTED_DEBUG”] = “DETAIL”
os.environ[“TORCH_CPP_LOG_LEVEL” ] = “INFO”
but I get nothing in log !
Do you have any suggestions?
Hi, @abderrahim_elhazmi , I don’t have suggestions regarding your question, but I’m also looking for the same currently - mainly to get the kernels executed on GPU.
Could you please let me know the source from where did you found the two lines ? And does setting the two env variables give us logs on the Kernels being executed on GPU ?
Thanks a lot !
You could use a profiler such as Nsight Systems, which will show the called CUDA kernels.