I saw a mention of a “new custom op API”. I am currently in the process of rewriting (after already rewriting it twice before as the suggested method has changed, and this, third, time I was rewriting the C/CUDA to be independent of PyTorch so that I could provide it as shared libraries that should work without the user having to compile anything), but would like to know if I should wait.
I would recommend not waiting. The existing custom op APIs are powerful enough that one can do a lot with them, but you should just be careful while using them to avoid potential footguns. Though, if you have a moment, I’m curious about your use case:
- are you writing C++ code and want to use that with PyTorch? Or are you writing Python code?
- are you doing training? Do you want your custom operator to work with torch.compile?
Hi Richard,
Thank you for your reply and for your interest in what I am working on, and apologies for the delay in replying (I missed the notification).
I have unfortunately never been able to get adequate performance with Python (even with torch.compile) so the computational part of the code is in C/C++ (for the CPU version) and CUDA.
It is used for training, but it is not a standard neural network. It implements a physical simulation using a finite difference discretization of time and space. Backpropagation is generally used to optimize the inputs (the spatial models of physical properties, source time series, etc.), but some users combine the physical simulation with a neural network (the outputs of one are the inputs of the other, for example).
The most recent previous implementation followed the approach necessary for my extension to be compatible with torch.jit (inheriting from the C++ class torch::autograd::Function and handling the dispatch), but I doubt that anyone used that ability, so this time I am just inheriting from the Python class autograd.Function, calling my compiled forward/backward functions, and not worrying (for now) about supporting torch.jit.script or torch.compile being applied to it. After encountering too many users who struggled to install the previous version because of difficulties with the compilation step, my main concern now is achieving the easiest installation that still provides adequate performance, hence my decision to now distribute the native code as precompiled shared/dynamic libraries that have no dependence on Python, PyTorch, etc., with the appropriate one for the user’s platform being loaded by ctypes at runtime.