How to set QNNPack parallelism?

For our usecase we need to do the job on one thread (preferable the main one). For all other workflows we just call torch.set_num_threads(1) and this gets the job done :slightly_smiling_face:.
But QNNPack (more specifically the quantized Convolution) doesn’t respect this setting, since it uses caffe2::pthreadpool_().
In our tests we get good results by making that function return null.

Is there a way to set the num_threads of that pool without recompiling pytorch?
If not, what’s the minimal acceptable refactoring to make that configurable or respect the torch.set_num_threads?

This is related to mobile, can you add a ‘mobile’ tag?

Our use-case is not for mobile. We plan to use QNNPACK if the machine we’re running on doesn’t support AVX2.

Should I add the mobile tag despite that?

I’m not sure how to add the mobile tag?

Changed the category for you. :wink:

1 Like

@ptrblck thanks (:
@jerryzh168 any updates on this?

@dbalchev, you can try caffe2::pthreadpool()->set_thread_count(1).

@kimishpatel We’re using the python wheel for inference. I’ve tried recompiling pytorch, so that torch.set_num_threads calls pthreadpool()->set_thread_count as well (PR with the rebased commit). It works, but we have to ship a custom pytorch wheel. I didn’t find a way to do that through the python API or even with a C++ extension.