I have a CUDA C++ application where I would like to call a PyTorch model on the GPU while I’m already inside of a kernel, running on the device. What library should I use to achieve this? Ideally I’d like to load up a model trained in PyTorch, then do inference inside of a running CUDA kernel. Is this possible?
For example, say I have a reinforcement learning environment implemented in a CUDA kernel. I launch 100 instances of the kernel on the GPU to compute 100 trajectories. The kernel loops over timesteps collecting states from the environment, computing next states. How can I integrate PyTorch model inference within that loop?