Is it possible to write (simple) custom kernels for Intel GPUs similar to the approach for NVIDIA GPUs using CUDA extensions?
I am used to programming in C++ and started looking into sycl and some of the XPU functionality in PyTorch. However, I miss a minimal example showing how basic functions like Tensor.pow() are implemented for XPU backends.