How to dump the kernels after inference/training?

Using nsys, we can profile the inference/training workload and see which kernels are launched.

I want to profile these kernels independantly (e.g., with some random input) but how to dump these kernels after inference/training?