Libtorch + CUDA kernels

umbriel · May 20, 2023, 12:27pm

I’m writing a module using libtorch with lots of point-wise operations (physics engine).
It runs pretty slow on CUDA and as I understand I’ve got to use custom kernels.
Is there an ‘official’ way to do it?
I saw this tutorial but the info relates only to python front end.

eqy · May 21, 2023, 6:30am

Is there a reason you must use libtorch? The typical recommendation for PyTorch 2.0+ is to torch.compile for workloads that have many pointwise operations amenable to e.g., operator fusion: torch.compile Tutorial — PyTorch Tutorials 2.0.1+cu117 documentation