Hello,
During quantization, I realized quantized operations such as quantized::mul, quantized::cat are x10 slower than fp32 ops.
Is the only workaround wrapping those functions with dequant() and quant()? Please refer to the profiling below…
FP32 Profiling (CPU)
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
-------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::mkldnn_convolution 36.28% 14.237ms 37.56% 14.741ms 1.340ms 7.88 Mb 0 b 11
aten::upsample_nearest2d 6.71% 2.632ms 8.38% 3.288ms 469.671us 15.33 Mb 15.32 Mb 7
aten::mul 6.10% 2.395ms 6.10% 2.395ms 342.071us 19.41 Mb 19.41 Mb 7
aten::_cat 5.54% 2.175ms 6.69% 2.626ms 375.100us 19.41 Mb 0 b 7
aten::_cat 5.28% 2.074ms 6.45% 2.533ms 361.814us 19.41 Mb 0 b 7
aten::upsample_nearest2d 4.50% 1.766ms 7.26% 2.851ms 407.229us 15.33 Mb 15.32 Mb 7
Quantized Profiling (CPU)
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem # of Calls
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
quantized::mul 20.94% 21.950ms 21.79% 22.839ms 3.263ms 4.85 Mb 0 b 7
quantized::cat 18.42% 19.306ms 18.84% 19.741ms 2.820ms 4.85 Mb 0 b 7
quantized::cat 17.84% 18.692ms 18.27% 19.143ms 2.735ms 4.85 Mb 0 b 7
quantized::conv2d 8.18% 8.576ms 8.86% 9.285ms 1.326ms 1.02 Mb -4.08 Mb 7
quantized::batch_norm2d 4.65% 4.878ms 5.34% 5.598ms 933.033us 980.00 Kb -7.75 Kb 6
quantized::conv2d 4.35% 4.561ms 4.90% 5.134ms 733.500us 981.00 Kb -3.83 Mb 7
quantized::mul 4.34% 4.553ms 5.13% 5.380ms 768.571us 1.02 Mb 0 b 7
quantized::leaky_relu 1.73% 1.810ms 2.13% 2.234ms 372.417us 980.00 Kb 0 b 6