If I wanted to override the implementation of the backward pass of a primitive matrix operation, say, matmul or convolution, how might I go about it?
Some subquestions:
(a) is there any way to do it (efficiently) in Python (just me hoping, haha)?
(b) Assuming the answer to (a) is no, can I implement each operation once in C++ and then it’ll work for all backends, or would I need to implement each operation for each backend e.g. cuda?
Thanks!