For our usecase we need to do the job on one thread (preferable the main one). For all other workflows we just call torch.set_num_threads(1) and this gets the job done .
But QNNPack (more specifically the quantized Convolution) doesn’t respect this setting, since it uses caffe2::pthreadpool_().
In our tests we get good results by making that function return null.
Is there a way to set the num_threads of that pool without recompiling pytorch?
If not, what’s the minimal acceptable refactoring to make that configurable or respect the torch.set_num_threads?
@kimishpatel We’re using the python wheel for inference. I’ve tried recompiling pytorch, so that torch.set_num_threads calls pthreadpool()->set_thread_count as well (PR with the rebased commit). It works, but we have to ship a custom pytorch wheel. I didn’t find a way to do that through the python API or even with a C++ extension.
@dbalchev sorry for late response. For some reason I did not get any notification. Unfortunately at the moment we dont have python API for this. Possibly in next release, or we can thin of introducing it in master if that may work for you.