Is there any way to speed up the jit loading process of quantized model?

mkcho · April 10, 2022, 11:28pm

Hi there!

I am writing to ask if there is an efficient way to speed up the loading time of quantized model (using post training static int8 quantization).

As of now, when I broke down the loading process of full-precision(i.e, fp32) and int8 model, loading time on int8 model requires much longer time than that on fp32 (see below). I think this is due to quantized::conv2d_prepack() process, but I don’t know how to optimize or speed up.

Thanks for reading my problem!
MK