C++ PyTorch model inference inside of a running CUDA kernel


I have a CUDA C++ application where I would like to call a PyTorch model on the GPU while I’m already inside of a kernel, running on the device. What library should I use to achieve this? Ideally I’d like to load up a model trained in PyTorch, then do inference inside of a running CUDA kernel. Is this possible?

For example, say I have a reinforcement learning environment implemented in a CUDA kernel. I launch 100 instances of the kernel on the GPU to compute 100 trajectories. The kernel loops over timesteps collecting states from the environment, computing next states. How can I integrate PyTorch model inference within that loop?


You could call other kernels form your current CUDA kernel, but since an entire PyTorch model isn’t a single kernel, I don’t think it’s possible.
I’m also unsure about your use case and why you you wouldn’t be able to execute the model from e.g. libtorch (C++).