Custom kernels for Intel GPUs

Is it possible to write (simple) custom kernels for Intel GPUs similar to the approach for NVIDIA GPUs using CUDA extensions?

I am used to programming in C++ and started looking into sycl and some of the XPU functionality in PyTorch. However, I miss a minimal example showing how basic functions like Tensor.pow() are implemented for XPU backends.

Hi @Matthias_Moller , we are working on this to provide the C++ extension for Intel GPU and we will keep you posted as soon as the PR being landed.

Hi @EikanWang, I am very happy to hear that this is WIP. Is there maybe some part of the LibTorch source code (C++ API) that I could already look at to see how the existing kernels are implemented? As said (maybe in my other post), I am not using the Python API but directly the C++ API, which might make it easier to integrate a few extra kernels.

By now, we have not distributed the libtorch for XPU. I think the Intel Extension for PyTorch may be another example to demonstrate how to extend aten operation through SYCL. However, it is sort of heavy to understand the logic. Is the libtorch must-to-have for your case?

I noticed your other post and we will update there.

Yes, libtorch is crucial as my entire code is written in c++. When installing PyTorch as described in Getting Started on Intel GPU — PyTorch 2.5 documentation I can find the libtorch libraries somewhere in the python installation. Except for my few custom kernels my code runs fine.

xpu: support sycl with torch.utils.cpp_extension APIs by dvrogozh · Pull Request #132945 · pytorch/pytorch FYI