pintonos
(Pintonos)
September 21, 2020, 3:07pm
1
Hi!
I am trying to implement quantization in my model.
In the case of Post Static Quantization some interesting detail came across:
quantized_model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
# torch.backends.quantized.engine = 'qnnpack' # gives error
works nearly perfect according to performance numbers. However, qnnpack
is not available as an engine on my machine.
Trying to use
quantized_model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
led to much worser performance numbers.
Also, in my opinion this should not work, but does perform very good:
quantized_model.qconfig = torch.quantization.get_default_qconfig('qnnpack')
torch.backends.quantized.engine = 'fbgemm'
Is this a bug? Shouldn’t fbgemm
outperform qnnpack
an a x86 system?
Yes, that would be expected. Does your system have AVX and AVX2 capabilities? Those are needed for the fast paths of the fbgemm kernels.
Yes, sounds like it could be a bug. Would you be able to share the per-op profiling results for the model you are seeing this for using Automatic differentiation package - torch.autograd — PyTorch 2.1 documentation on both fbgemm and qnnpack on your machine? Qnnpack only has fast kernels on ARM, on x86 it is taking the slow fallback path.
pintonos
(Pintonos)
September 23, 2020, 6:42am
5
Profile for fbgemm
for evaluation:
------------------------------------ --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
------------------------------------ --------------- --------------- --------------- --------------- --------------- ---------------
mul 64.88% 3.584s 65.20% 3.602s 13.341ms 270
sum 13.79% 761.666ms 15.01% 829.509ms 1.097ms 756
quantized::linear 12.68% 700.596ms 12.68% 700.596ms 19.461ms 36
_cat 3.06% 168.962ms 3.14% 173.683ms 6.433ms 27
relu 1.32% 73.125ms 1.34% 73.805ms 2.734ms 27
fill_ 1.17% 64.873ms 1.17% 64.876ms 82.855us 783
index_select 0.73% 40.152ms 1.22% 67.359ms 95.953us 702
copy_ 0.42% 23.189ms 0.42% 23.197ms 44.438us 522
empty 0.39% 21.815ms 0.39% 21.815ms 9.696us 2250
quantize_per_tensor 0.38% 20.759ms 0.38% 20.771ms 2.308ms 9
cat 0.16% 9.051ms 3.31% 182.734ms 6.768ms 27
embedding 0.15% 8.441ms 2.80% 154.721ms 110.200us 1404
...
Metrics:
Size (MB): 3.466263
Loss: 1.093 (not good)
Acc: 0.622
Elapsed time (seconds): 7.084
Avg execution time per forward(ms): 0.00363
Profile for qnnpack
for evaluation:
------------------------------------ --------------- --------------- --------------- --------------- --------------- ---------------
Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls
------------------------------------ --------------- --------------- --------------- --------------- --------------- ---------------
mul 66.18% 3.379s 66.49% 3.395s 12.573ms 270
sum 12.98% 662.933ms 14.21% 725.287ms 959.374us 756
quantized::linear 12.45% 635.799ms 12.45% 635.799ms 17.661ms 36
_cat 3.14% 160.059ms 3.23% 164.724ms 6.101ms 27
relu 1.33% 67.692ms 1.34% 68.278ms 2.529ms 27
fill_ 1.17% 59.914ms 1.17% 59.917ms 76.522us 783
index_select 0.68% 34.661ms 1.11% 56.808ms 80.923us 702
empty 0.38% 19.191ms 0.38% 19.191ms 8.529us 2250
quantize_per_tensor 0.37% 18.920ms 0.37% 18.930ms 2.103ms 9
copy_ 0.35% 17.947ms 0.35% 17.954ms 34.394us 522
embedding 0.14% 7.034ms 2.52% 128.492ms 91.519us 1404
...
Metrics:
Size (MB): 3.443591
Loss: 0.580 (very good)
Acc: 0.720
Elapsed time (seconds): 6.978
Avg execution time per forward(ms): 0.00427
hmm, one hypothesis that would fit this data is that fbgemm is not enabled, and both fbgemm and qnnpack are taking the fallback paths.
cc @dskhudia , any tips?
pintonos
(Pintonos)
September 30, 2020, 7:15am
7
@Vasiliy_Kuznetsov How does such fallback path look like? What happens in such a case?
dskhudia
(Daya Khudia)
October 8, 2020, 11:15pm
8
@pintonos By performance do you mean the loss? It should be the same (close enough) loss with both. I see the execution time similar with both fbgemm and qnnpack for quantized::linear.
pintonos
(Pintonos)
October 9, 2020, 6:16am
9
@dskhudia Yes i mean loss. I tried to run it on a bigger dataset and it seems to work now…
1 Like