I’m trying out QNNPack by running the mobilenet_v2_quant model on Jetson TX2. Significant performance gap between Nvidia Denver CPU and ARMv8 A57 was observed and I cannot figure out why.
For folks not familiar with Nvidia Jetson TX2, it has 6-core CPU: 2 Nvidia Denver and 4 ARM-A57. It allows you to choose to turn these on/off with different modes.
When I run with all 6 cores at 2GHz, I can get 15 fps performance, give or take.
When running with 2 Denver cores only, it drops to around 6fps.
However, when running with 4 A57 cores only, it drops to jaw-dropping 1fps, which is shocking.
Interestingly, I also tried on a Rasp pi with 4 ARMv7 cores at 1.2 GHz (Model 2B, I think), I can achieve around 10fps performance.
I can’t find a proper explanation for this behavior, and I’m hoping someone can give me some pointers to figure out what the heck is going on.