Torchscript custom ops with CUDA

I want to replace custom c++/cuda ops with custom torchscript c++/cuda ops to be able to export model from python to c++. Currently the EXTENDING TORCHSCRIPT WITH CUSTOM C++ OPERATORS tutorial only handle C++ usecase, but in the end states:

You are now ready to extend your TorchScript models with C++ operators that interface with third party C++ libraries, write custom high performance CUDA kernels, or implement any other use case that requires the lines between Python, TorchScript and C++ to blend smoothly.

On the other hand in Writing a Mixed C++/CUDA extension tutorial it states that there’s some magic goin’on :slight_smile:

The general strategy for writing a CUDA extension is to first write a C++ file which defines the functions that will be called from Python, and binds those functions to Python with pybind11. Furthermore, this file will also declare functions that are defined in CUDA ( .cu ) files. The C++ functions will then do some checks and ultimately forward its calls to the CUDA functions. In the CUDA files, we write our actual CUDA kernels. The cpp_extension package will then take care of compiling the C++ sources with a C++ compiler like gcc and the CUDA sources with NVIDIA’s nvcc compiler. This ensures that each compiler takes care of files it knows best to compile. Ultimately, they will be linked into one shared library that is available to us from Python code.

Question is how to do it for torchscript op?


You can do the very same - write your kernel in .cu. Put the op registration in the .cpp.
I usually recommend to use load/load_inline with is_python_module=False as the first step to move from C++ extensions to Custom Ops.

Best regards


Thank you Tom. I was not sure who’s gonna trigger compilation of .cu source.
BTW torchvision recently introduced torch.ops so I have to analyze it