Is it possible to write pure C++/CUDA mixed program without interfacing with python?

Hi,

Maybe this is a strange question, but I’m using libtorch as a scientific computing library to directly interface with hardware. The ease-of-use c++ frontend really saved my life in many cases, especially when I need to use cuda to accelerate the program. However, it becomes very hard when I want to manually write some cuda processing codes to improve the performance.

Tensor Basics — PyTorch master documentation is a good implication about what I’d like to do, but I don’t even know how to call those cuda kernels inside my c++ code or even compile them together. I’m a newbie in CUDA programming so and advice will be of great help!

The most related question is from Libtorch with pure cuda codes cannot compiled but it no solution is provided. My platform is ubuntu2004.

I think one valid approach would be to add custom C++ ops as described in this tutorial. If I’m not mistaken, torchvison also uses this approach and you can take a look at their usage in these files.