Using nsys, we can profile the inference/training workload and see which kernels are launched.
I want to profile these kernels independantly (e.g., with some random input) but how to dump these kernels after inference/training?
Using nsys, we can profile the inference/training workload and see which kernels are launched.
I want to profile these kernels independantly (e.g., with some random input) but how to dump these kernels after inference/training?