I have tried the Ahead of time(AOT) complied executable bundles in glow using this documentation.
These are the two models which i have tried(Pytorch models converted to ONNX model).
Resnet 18
VGG 16
Command for compiling the model to glow AOT bundles
model-compiler -backend=CPU -target=x86-64 -model=model.onnx -emit-bundle=./Model -bundle-api=dynamic
Command for generating executable
clang++ main.cpp model.o -lpng -o Model
Glow bundle Inference time vs Pytorch model inference time:
why there is a huge gap in glow and Pytorch inference performance?
OSS backends in Glow are not primarily focused on optimizing CPU performance. Additionally it really depends on what CPU architecture you’re using. For example we have some convolution implementation that we know has good performance on x86, but is bad on ARM. Since you’re on x86, I should note that I think the fast x86 version was disabled accidentally and needs to be fixed – see this comment which I don’t think was yet resolved.
Is there any way i can improve the glow compiled model inference time?
I imagine enabling the well-performing conv kernel will get you much closer to the PyTorch performance. However as I mentioned, in general CPU performance is not something we’re focused hugely on. One way you could probably get to parity w/ PyTorch is to copy their convolution kernel source code and use it in libjit*.cpp as the source code for the kernels Glow compiles. I don’t know how easily that could be done but it should be doable. Then you could benefit from Glow’s high level Graph optimizations, memory optimization, and PyTorch’s optimized kernels.