I have an operation computed with a custom CUDA kernel, which involves a custom forward, and custom backward pass. I have this working in Python (calling via PyBind11 to the custom C++). I am now porting the entire pipeline to use the PyTorch C++ frontend, since I can then do away with Python completely.
I’m struggling to figure out how the autograd infrastructure builds the computation graph for the backwards pass in C++, how gradients are propagated backwards, and how I would fit my custom backward pass into the pipeline.
Is there any documentation on this? Or any examples where a custom backward function is implemented?