[Caffe2] QNNPack performance issue on Jetson TX2

I’m trying out QNNPack by running the mobilenet_v2_quant model on Jetson TX2. Significant performance gap between Nvidia Denver CPU and ARMv8 A57 was observed and I cannot figure out why.

For folks not familiar with Nvidia Jetson TX2, it has 6-core CPU: 2 Nvidia Denver and 4 ARM-A57. It allows you to choose to turn these on/off with different modes.

When I run with all 6 cores at 2GHz, I can get 15 fps performance, give or take.

When running with 2 Denver cores only, it drops to around 6fps.

However, when running with 4 A57 cores only, it drops to jaw-dropping 1fps, which is shocking.

Interestingly, I also tried on a Rasp pi with 4 ARMv7 cores at 1.2 GHz (Model 2B, I think), I can achieve around 10fps performance.

I can’t find a proper explanation for this behavior, and I’m hoping someone can give me some pointers to figure out what the heck is going on.

Thank you.