Writing custom cuda kernels for pytorch

I am trying to write custom cuda kernel for pytorch for a specific computation. Is there any available documentation for writing custom cuda kernels for pytorch?

There’s a gist that demonstrates how could you use PyCuda for that.

1 Like