I have an operation computed with a custom CUDA kernel, which involves a custom forward, and custom backward pass. I have this working in Python (calling via PyBind11 to the custom C++). I am now porting the entire pipeline to use the PyTorch C++ frontend, since I can then do away with Python completely.
I’m struggling to figure out how the autograd infrastructure builds the computation graph for the backwards pass in C++, how gradients are propagated backwards, and how I would fit my custom backward pass into the pipeline.
Is there any documentation on this? Or any examples where a custom backward function is implemented?
To answer my own question, I compiled libtorch from source and reverse engineered the autogenerated functions (eg. torch::pow() which uses e.g. the PowBackward0 backward function).
Doing this I came up with the minimal working example:
To show a clean example, I cut out a lot of JIT stuff from the auto-generated examples, as well as checks and balances (asserts, error handling logic if wrong types are passed, etc).
I’d still be interested in someone’s input on how the whole autograd infrastructure works in C++. But in the meantime, I hope this example helps someone.
@mikehamer Thanks a lot for sharing your code. I tried your code with the libtorch 1.4 version, it works great with few changes:
- auto result = as_variable(tmp); need to be changed to auto result = tmp; since as_variable is no longer available.
- deleteFunction and Function need to be replaced with deleteNode and Node.