Decrease in the Speed of Quantization Model

mohit7 · February 4, 2020, 12:11pm

I have my base model as Squeezenet after porting it to android the execution time of model is around 160ms and after Quantizing the model generally the time should decrease but the time is increased to 220ms. Is there is something wrong or this can be possible?

jerryzh168 · February 14, 2020, 6:52pm

are you sure everything is properly quantized? could you print the model before and after quantization?

mohit7 · February 18, 2020, 9:08am

@jerryzh168 I printed the model and everything is properly quantized.
In squeezenet I have to replace torch.cat to nn.FloatFunctional.cat so I think that function is giving issues which is also mentioned in other threads also

masahi · February 20, 2020, 4:14am

@mohit7 My answer at Quantized::cat running time is slower than fp32 model may help answer your question.