I just built from source the pytorch v1.0.1 using the following commands (did it twice, once inside a Raspberry Pi 3+ and a second time with a qemu emulating a armv7, yelding the same results):
I did got a valid whell, but after running a test script for evaluating a difference (comparing to keras+Tf) in performance of some predictions, I realized that the pytorch prediction was roughly twice as slow as the Keras+TF prediction time.
What is weird, is that when I do the same test (multiple times) in a x89 processor (using existent pytorch wheels) the pytorch predictions show itself slightly faster than the Keras+TF.
Is there anything I need to do to get the built pytorch run predictions faster?
I did tweak those compile flags and runtime flags (Like NUM_CPUS=4 && OMP_NUM_THREADS=4 && MKL_NUM_THREADS=4) but the best result I got was the “twice as slow” in predictions.
(Needless to say, but of course the model used for testing TF and Pytorch were the same, with the same number of parameters and the input alike. And the pytorch model was exported with jit)