Torchscript w/ custom CUDA operator

Are there instructions for properly including CUDA kernels in custom TorchScript ops built using CMake? I haven’t been able to find any, and I’ve hit an odd nvcc-related compilation error when I attempt to do so (See the log file and the repository which I am using to test these concepts)

For my use-case, I need to serialize a model into TorchScript which references custom CUDA ops. Said model must also be executable from C++/Rust. The instructions for compiling a C++ operator for TorchScript here work well for pure C++ operators, and the op is both traceable in python and useable from C++ and Rust. However, I must also be able to include C++ operators that execute CUDA kernels (in fact, I’m actually porting a neural net that used the JIT method for compiling these - but in C++/Rust I need to link against a DLL or shared object so I cannot use this afaik).

Thanks!