If I have a pytorch script model with fp32 datatype.
I want to measure the quant performance on modile with qnnpack( just use this fp32 model but choose to use int8 as inference datatype).
I just want to know with the same net architecture, the performance difference between fp32 and int8.
Does pytorch has this kind of tools? Like TensorRT trtexec --int8 with fp32 model
This thread might be useful for you Speed benchmarking on android?
Please reach out to the mobile team if this script doesnโt work as expected.