Recently, I deployed my models to android using libtorch (1.6). Models are quantized and exported using torch.jit.script. When I do inference using C++, I find that model with 0.8M parameters has the same cpu usage as the model with 2M parameters. I have set thread number to 1 using the flowing code:
How can I know if I’m using qnnpack when I do inference?