How to speed up Libtorch C++ inference on CPU?

Hi everyone,

I convert the trained Pytorch model to torchscript, then I use Libtorch C++ version 1.5.1 cpu to deploy my implementation on CPU. However, the inference time of the torchscript model is unstable (fluctuate from 5ms to 30ms with a batch of 30 images with a size of 48x48x3. If I export the model to onnx and deploy it using onnxruntime, the runtime is more stable and faster a bit. But I’d like to use Libtorch in my implementation.

Can you please give some suggestions to improve the runtime for torchscript models on CPU? Do I need to set any environment variables in the .bashrc file?
Thank you in advance!

Try setting OMP_NUM_THREADS=1

1 Like