Recently, I learned how to use CUDA C to write a few of my custom layers (simple tasks like calling a kernel for matrix multiplication, for example). I wonder how I can leverage PyTorch’s autograd to turn my CUDA C layers into an autograd layer like the classes inherited from nn.Module in PyTorch.
I have read many discussions on forums, as well as the official PyTorch documentation here. However, I find those examples a bit complex (involving many concepts mentioned as prerequisites, such as pybind, “just in time,” etc.). Is there anywhere that provides an example of writing a Conv2D in CUDA C and the steps to make it work as an nn.Module (i.e., with autograd) like in PyTorch? If not, where should I start learning to do this?
Thank you very much for your time.