Custom rocm hip and c++ extensions

Dear PyTorch developers and community,

We have nice tutorial cpp_extension on custom cuda extensions written by Peter Goldsborough. I’m wondering if the same can be done but on AMD GPUs with kernels written using rocm HIP. I mean the following: call custom forward+backward hip kernel from pytorch and include it in deep learning pipeline. Is it currently supported and are there any limitations?

Does somebody have experience of writing custom hip/c++ kernels and using them in pytorch?