Model bundle performance / multi-core inference

Hello!

I am trying to use Glow in order to increase inference speed for a custom pytorch model based on mobilenet architecture. The model was converted to onnx format and I modified resnet50 example to get a model bundle for my model.
Inference time for original traced pytorch model is ~27ms per image. For the compiled bundle inference time is ~800ms.
Taking in to account that glow uses only 1 core during inference and I have 12-core (24 thread) CPU, the expected performance for multi-core inference using glow is going to be 800 ms / 24 ~ 33 ms (it is the lowest estimate).

My questions:

  1. Performance of the compiled model is worse comparing with the original pytorch model. Is it expected? Have I missed some optimization steps? Should I use another way to get better performance?
  2. How can I run multi-core inference using Glow with CPU backend?
2 Likes
  1. Performance of the compiled model is worse comparing with the original pytorch model. Is it expected? Have I missed some optimization steps? Should I use another way to get better performance?

PyTorch works hard to implement optimized kernels for CPU. In Glow, we haven’t spent much time recently working on optimizing our Conv kernels for our CPU backend. We have focused much more on custom HW AI accelerators such as the NNPI and Habana backends in Glow. IIRC, last we focused on CPU perf a couple years ago it was originally tested on a few specific models of x86 CPUs. So your perf is going to depend on what CPU you’re using.

  1. How can I run multi-core inference using Glow with CPU backend?

We don’t currently support this. See prior discussion here: https://github.com/pytorch/glow/issues/1749

@jfix does this mean any evaluation board having any kind of cpu i.e. arm cortex won’t be able to take advantage of glow in inference?

You can absolutely run on CPU architectures, and performance should still be pretty good, just might not be as good as other frameworks. You could always modify/add to the Glow CPU backend’s libjit with kernels for ops you care about that are better targeted for ARM.

I know NXP has had a lot of success targeting ARM Cortex with Glow: https://media.nxp.com/news-releases/news-release-details/industrys-first-mcu-based-implementation-glow-neural-network – maybe worth posting an issue on Glow’s GH with tagging mciprian13 from NXP to see if you can find out more.

Thanks a lot @jfix for quick response. I will check NXP offerings.
Does glow still uses one cpu core for inference? I can see multi core inference issue is still open on github.

Yes, we didn’t really invest much in multicore execution. We have focused on accelerators such as NNPI and Habana. I don’t expect it’d be hard to go data parallel and split a batch by the number of cores you have.