Qnnpack vs. fbgemm

pintonos · September 21, 2020, 3:07pm

Hi!
I am trying to implement quantization in my model.
In the case of Post Static Quantization some interesting detail came across:

quantized_model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
# torch.backends.quantized.engine = 'qnnpack' # gives error

works nearly perfect according to performance numbers. However, qnnpack is not available as an engine on my machine.

Trying to use

quantized_model.qconfig = torch.quantization.get_default_qconfig('fbgemm')

led to much worser performance numbers.

Also, in my opinion this should not work, but does perform very good:

quantized_model.qconfig = torch.quantization.get_default_qconfig('qnnpack') 
torch.backends.quantized.engine = 'fbgemm'

Is this a bug? Shouldn’t fbgemm outperform qnnpack an a x86 system?

Vasiliy_Kuznetsov · September 22, 2020, 3:21pm

Yes, that would be expected. Does your system have AVX and AVX2 capabilities? Those are needed for the fast paths of the fbgemm kernels.

Vasiliy_Kuznetsov · September 23, 2020, 3:23am

Yes, sounds like it could be a bug. Would you be able to share the per-op profiling results for the model you are seeing this for using Automatic differentiation package - torch.autograd — PyTorch 2.1 documentation on both fbgemm and qnnpack on your machine? Qnnpack only has fast kernels on ARM, on x86 it is taking the slow fallback path.

pintonos · September 23, 2020, 6:42am

Profile for fbgemm for evaluation:

------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
Name                                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
mul                                   64.88%           3.584s           65.20%           3.602s           13.341ms         270              
sum                                   13.79%           761.666ms        15.01%           829.509ms        1.097ms          756              
quantized::linear                     12.68%           700.596ms        12.68%           700.596ms        19.461ms         36               
_cat                                  3.06%            168.962ms        3.14%            173.683ms        6.433ms          27               
relu                                  1.32%            73.125ms         1.34%            73.805ms         2.734ms          27               
fill_                                 1.17%            64.873ms         1.17%            64.876ms         82.855us         783              
index_select                          0.73%            40.152ms         1.22%            67.359ms         95.953us         702              
copy_                                 0.42%            23.189ms         0.42%            23.197ms         44.438us         522              
empty                                 0.39%            21.815ms         0.39%            21.815ms         9.696us          2250             
quantize_per_tensor                   0.38%            20.759ms         0.38%            20.771ms         2.308ms          9                
cat                                   0.16%            9.051ms          3.31%            182.734ms        6.768ms          27               
embedding                             0.15%            8.441ms          2.80%            154.721ms        110.200us        1404  
...

Metrics:

Size (MB): 3.466263
Loss: 1.093 (not good)
Acc: 0.622
Elapsed time (seconds): 7.084
Avg execution time per forward(ms): 0.00363

Profile for qnnpack for evaluation:

------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
Name                                  Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  
------------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  
mul                                   66.18%           3.379s           66.49%           3.395s           12.573ms         270              
sum                                   12.98%           662.933ms        14.21%           725.287ms        959.374us        756              
quantized::linear                     12.45%           635.799ms        12.45%           635.799ms        17.661ms         36               
_cat                                  3.14%            160.059ms        3.23%            164.724ms        6.101ms          27               
relu                                  1.33%            67.692ms         1.34%            68.278ms         2.529ms          27               
fill_                                 1.17%            59.914ms         1.17%            59.917ms         76.522us         783              
index_select                          0.68%            34.661ms         1.11%            56.808ms         80.923us         702              
empty                                 0.38%            19.191ms         0.38%            19.191ms         8.529us          2250             
quantize_per_tensor                   0.37%            18.920ms         0.37%            18.930ms         2.103ms          9                
copy_                                 0.35%            17.947ms         0.35%            17.954ms         34.394us         522              
embedding                             0.14%            7.034ms          2.52%            128.492ms        91.519us         1404             
...

Metrics:

Size (MB): 3.443591
Loss: 0.580 (very good)
Acc: 0.720
Elapsed time (seconds): 6.978
Avg execution time per forward(ms): 0.00427

Vasiliy_Kuznetsov · September 23, 2020, 4:00pm

hmm, one hypothesis that would fit this data is that fbgemm is not enabled, and both fbgemm and qnnpack are taking the fallback paths.

cc @dskhudia , any tips?

pintonos · September 30, 2020, 7:15am

@Vasiliy_Kuznetsov How does such fallback path look like? What happens in such a case?

dskhudia · October 8, 2020, 11:15pm

@pintonos By performance do you mean the loss? It should be the same (close enough) loss with both. I see the execution time similar with both fbgemm and qnnpack for quantized::linear.

pintonos · October 9, 2020, 6:16am

@dskhudia Yes i mean loss. I tried to run it on a bigger dataset and it seems to work now…