I have my base model as Squeezenet after porting it to android the execution time of model is around 160ms and after Quantizing the model generally the time should decrease but the time is increased to 220ms. Is there is something wrong or this can be possible?
are you sure everything is properly quantized? could you print the model before and after quantization?
@jerryzh168 I printed the model and everything is properly quantized.
In squeezenet I have to replace torch.cat to nn.FloatFunctional.cat so I think that function is giving issues which is also mentioned in other threads also
@mohit7 My answer at Quantized::cat running time is slower than fp32 model may help answer your question.