To implement a variant of conv op in low-level APIs

My target is to implement a variant of convolution op, and I have implemented it in C++ and CUDA through CPP extension. However, the speed is not satisfactory, I guess I’ll have to implement it with lower level APIs. It seems “aten/src/ATen/native/Convolution.cpp” is the right place to do so, and it further calls “aten/src/THNN/generic/SpatialConvolutionMM.c”, so I modify “SpatialConvolutionMM.c”. The problem is after compiling, I cannot see any changes on the outputs. Maybe the files in THNN are wrapped in “ATen/native” in some way, I just cannot figure it out. Could anyone give me some suggestions? Thank you.

Are you sure the particular method is called in your current convolution?
There are some switches in Convolution.cpp, so that maybe another algorithm might be used?