I want to replace custom c++/cuda ops with custom torchscript c++/cuda ops to be able to export model from python to c++. Currently the EXTENDING TORCHSCRIPT WITH CUSTOM C++ OPERATORS tutorial only handle C++ usecase, but in the end states:
You are now ready to extend your TorchScript models with C++ operators that interface with third party C++ libraries, write custom high performance CUDA kernels, or implement any other use case that requires the lines between Python, TorchScript and C++ to blend smoothly.
On the other hand in Writing a Mixed C++/CUDA extension tutorial it states that there’s some magic goin’on
The general strategy for writing a CUDA extension is to first write a C++ file which defines the functions that will be called from Python, and binds those functions to Python with pybind11. Furthermore, this file will also declare functions that are defined in CUDA (
.cu) files. The C++ functions will then do some checks and ultimately forward its calls to the CUDA functions. In the CUDA files, we write our actual CUDA kernels. The
cpp_extensionpackage will then take care of compiling the C++ sources with a C++ compiler like
gccand the CUDA sources with NVIDIA’s
nvcccompiler. This ensures that each compiler takes care of files it knows best to compile. Ultimately, they will be linked into one shared library that is available to us from Python code.
Question is how to do it for torchscript op?