How to learn the implementation of basic operation

Hi, I am new in Pytorch and GPU programming

I want to learn how some basic tensor operations in Pytorch (e.g. ‘.sum()’ ) is implemented in GPU, like how Pytorch utilizes GPU parallelization for each operation.

Can somebody help me to show in which part of the Pytorch code I can starting to learn about this?

Thank you

Hmmm it’s hard to say but actually pytorch doesn’t implement anything (I’d say). Everything relies on CUDA and CuDNN. So learning pytorch is very far from understanding cuda programming.

To learn cuda programming you can have a look at nvidia courses.
there is one with numba and python that is easy to follow.

I see, is there any ways to see how PyTorch tensor operation works in the lower level (e.g. how they use CuDNN)? My aim is to make sure and understand how the tensor basic operation works in the lower level