Slow inference speed with Glow

Hi,

I have tried the Ahead of time(AOT) complied executable bundles in glow using this documentation.

These are the two models which i have tried(Pytorch models converted to ONNX model).

  1. Resnet 18
  2. VGG 16

Command for compiling the model to glow AOT bundles
model-compiler -backend=CPU -target=x86-64 -model=model.onnx -emit-bundle=./Model -bundle-api=dynamic

Command for generating executable
clang++ main.cpp model.o -lpng -o Model

Glow bundle Inference time vs Pytorch model inference time:

System configuration:
CPU: i3-6006U
RAM: 8 GB
OS: Ubuntu 16.04
LLVM Version: 8.0.1
Clang Version: 8.0.1

1) Resnet 18:
Glow Bundle: 620 ms
Pytorch : 100 ms
2) VGG 16:
Glow Bundle: 7680ms
Pytorch: 410ms

Questions:

  1. why there is a huge gap in glow and Pytorch inference performance?
  2. Is there any way i can improve the glow compiled model inference time?

Thanks and Regards
Adithya

Hi Adithya,

  1. why there is a huge gap in glow and Pytorch inference performance?

OSS backends in Glow are not primarily focused on optimizing CPU performance. Additionally it really depends on what CPU architecture you’re using. For example we have some convolution implementation that we know has good performance on x86, but is bad on ARM. Since you’re on x86, I should note that I think the fast x86 version was disabled accidentally and needs to be fixed – see this comment which I don’t think was yet resolved.

  1. Is there any way i can improve the glow compiled model inference time?

I imagine enabling the well-performing conv kernel will get you much closer to the PyTorch performance. However as I mentioned, in general CPU performance is not something we’re focused hugely on. One way you could probably get to parity w/ PyTorch is to copy their convolution kernel source code and use it in libjit*.cpp as the source code for the kernels Glow compiles. I don’t know how easily that could be done but it should be doable. Then you could benefit from Glow’s high level Graph optimizations, memory optimization, and PyTorch’s optimized kernels.