In my project, I am trying to compare two techniques:

spatial convolution
andfiltering via FFT
.
At this point I am only focused on the number of operations. Optimizing parallelism is not part of my scope.
So, in theory, the techniques I am comparing against the standard ones should be fasts due to have less operation. But that hasn’t be the case when I time both executions. I am assuming that, though running in CPU, there is still a lot of parallelism going on.
So, how do I turn that off?
From the torch notes I am assuming it would be something like this:
torch.set_num_interop_threads(1)
torch.set_num_threads(1)
But when timing both techniques, that still doesn’t give the results I am looking for, meaning, spatial convolution
is still taking less time than filtering via FFT
.
PS: These are the accumulated timing for 128
runs for
 Standard Convolution
 Custom implementation of Convolution, based on GEMM method

FFT filtering python
implementation 
FFT filtering C++ extension
implementation
# time_default, time_custom, time_python, time_cpp
0.8500628471374512, 5.313140630722046, 2.029076337814331, 1.942748785018921
# time_default, time_custom, time_python, time_cpp
# Setting threads num. to 1
0.894277811050415, 5.291566371917725, 1.9720547199249268, 1.8894264698028564
How come the standard conv2d is way faster when it suppose to be slower?