For example, after I built a model in python, is it possible for me to print all the specific Cuda kernel functions that will be used? Is it true that every time the kernel is launched, the size of the kernel is calculated on the fly instead of being determined statically before training starts? Thanks
It’s not possible to predict which kernels will be called in your script before running it, as this would need a sort of static code analysis, which would also need to know your current setup, since kernels are also depending on the GPU. Also, if you are using cudnn benchmarking, the actual kernel choice depends on the profiling, which is also non-deterministic and could depend on the workload your machine has at the moment.
Could you explain a bit what “size” of the kernel you are referring to?
Thanks for the reply! @ptrblck
The size means the “<<<X,Y,Z>>>” before the cuda kernel function call. As I went through some of the code, I now think the size for the same function is the same across different runs, but always computed again and again at runtime for the same cuda kernel function call each time. Is this correct(after benchmarking and picking the actual kernels for a static model)?
And for cudnn kernel, PyTorch has no way to decide the size right? “<<<X,Y,Z>>>” is determined inside cudnn like a blackbox.
Yes, cudnn kernel launches are not exposed.
The kernel launch parameters are determined by the workload and would thus also need some kind of static analysis, which I’m not sure how it would look like.