How to set QNNPack parallelism?

For our usecase we need to do the job on one thread (preferable the main one). For all other workflows we just call torch.set_num_threads(1) and this gets the job done :slightly_smiling_face:.
But QNNPack (more specifically the quantized Convolution) doesn’t respect this setting, since it uses caffe2::pthreadpool_().
In our tests we get good results by making that function return null.

Is there a way to set the num_threads of that pool without recompiling pytorch?
If not, what’s the minimal acceptable refactoring to make that configurable or respect the torch.set_num_threads?

This is related to mobile, can you add a ‘mobile’ tag?

Our use-case is not for mobile. We plan to use QNNPACK if the machine we’re running on doesn’t support AVX2.

Should I add the mobile tag despite that?

I’m not sure how to add the mobile tag?

Changed the category for you. :wink:

1 Like

@ptrblck thanks (:
@jerryzh168 any updates on this?

@dbalchev, you can try caffe2::pthreadpool()->set_thread_count(1).

@kimishpatel We’re using the python wheel for inference. I’ve tried recompiling pytorch, so that torch.set_num_threads calls pthreadpool()->set_thread_count as well (PR with the rebased commit). It works, but we have to ship a custom pytorch wheel. I didn’t find a way to do that through the python API or even with a C++ extension.

@dbalchev sorry for late response. For some reason I did not get any notification. Unfortunately at the moment we dont have python API for this. Possibly in next release, or we can thin of introducing it in master if that may work for you.

@kimishpatel Thanks for the reply! In my PR I set the size of that pool in torch.set_num_threads and it’s approved (: