I want to add support for pruning whereby we can add hints to a layer, say some flag called ‘pruned’, so that instead of doing the convolution, we just format the output and then pass a zero tensor. This is to basically skip doing unnecessary ops on pruned weights.
Is there some type of architecture document that covers call graphs or how python invokes the native C++ functions for different architectures. I am interested in understanding what are the entry points for the different calls. For example, if I want to edit the C++ code for the convolution operation, which is the entry point? There are a bunch of Conv implementations but without guidance it is hard to keep track.