I am trying to use Glow in order to increase inference speed for a custom pytorch model based on mobilenet architecture. The model was converted to onnx format and I modified resnet50 example to get a model bundle for my model.
Inference time for original traced pytorch model is ~27ms per image. For the compiled bundle inference time is ~800ms.
Taking in to account that glow uses only 1 core during inference and I have 12-core (24 thread) CPU, the expected performance for multi-core inference using glow is going to be 800 ms / 24 ~ 33 ms (it is the lowest estimate).
- Performance of the compiled model is worse comparing with the original pytorch model. Is it expected? Have I missed some optimization steps? Should I use another way to get better performance?
- How can I run multi-core inference using Glow with CPU backend?