Is it possible to write (simple) custom kernels for Intel GPUs similar to the approach for NVIDIA GPUs using CUDA extensions?
I am used to programming in C++ and started looking into sycl and some of the XPU functionality in PyTorch. However, I miss a minimal example showing how basic functions like Tensor.pow() are implemented for XPU backends.
Hi @EikanWang, I am very happy to hear that this is WIP. Is there maybe some part of the LibTorch source code (C++ API) that I could already look at to see how the existing kernels are implemented? As said (maybe in my other post), I am not using the Python API but directly the C++ API, which might make it easier to integrate a few extra kernels.
By now, we have not distributed the libtorch for XPU. I think the Intel Extension for PyTorch may be another example to demonstrate how to extend aten operation through SYCL. However, it is sort of heavy to understand the logic. Is the libtorch must-to-have for your case?
Yes, libtorch is crucial as my entire code is written in c++. When installing PyTorch as described in Getting Started on Intel GPU — PyTorch 2.5 documentation I can find the libtorch libraries somewhere in the python installation. Except for my few custom kernels my code runs fine.