CPU Inference optimization?


I recently moved from tensorflow to pytorch and from a development setting its brilliant!
However, we use (unfortunately) cpu only for serving the models and we noticed a
huge drop in performance when comparing the tensorflow and pytorch models.

As I would love to continue to use pytorch I was wondering if anyone had some good
tips/hits/best practices to share on how to get pytorch to operate properly in a cpu only environment.
getting a gpu in production is not possible atm due to some weird contract issues


Hi Ooki,

If you have a representative benchmark script that you can put together, I can try to figure out how we can optimize pytorch CPU perf for you in a short amount of time.

Don’t have representative benchmark right now, but what are the things to look for with cpu inference?
Any flags to look for? Any point in compiling from source? Any don’t do’s ?

And last but not the least, any way to force pytorch to use all the core’s on the cpu?

In another post they say that pytorch is optimized for gpu only. I train the model with pytorch and copy the weights to caffe manually, which is quite optimized for inference in cpu, and moreover, you can use Intel caffe, much more optimized for Intel processor. You can try with onnx https://github.com/onnx to convert your model into other frameworks that are optimized for cpu.

Thanks lolongcovas, im just going to transfer it to caffe2 and run it there for production.
Also, found this nice tutorial, so I guess the pytorch guys know my needs before I do :smiley:

Can you point us to what kind of network are you using, is it a CNN like ResNet for example? Also, do you use batches > 1 during inference? So that we can have a better idea on what are your use-cases and what is slow compared to TF.

As far as I understand the best way to optimize a model for CPU a year ago was transfer to Caffe2.
Now with PyTorch 1.0 what is the best option? I describe my situation and thoughts below.

In my case architecture is the next: https://github.com/huggingface/pytorch-openai-transformer-lm/blob/master/model_pytorch.py. I tried to use onnx, but some of the model operations are not supported, such as masked selects clf_h = clf_h[flat == self.clf_token, :] in particular. So I guess it’s not an option for me.

So with PyTorch alone I suppose there are 2 inclusive options: 1) pytorch script (v1.0) and 2) compilation tricks.

  1. To turn the model into pytorch script tracing mechanism must be enough I guess, because I don’t have complex control flow (I haven’t tried yet though)
  2. By compilation tricks I mean pytorch builds with MKLDNN or anything else which can accelerate the inference on CPU (found this stuff as well https://github.com/intel/pytorch#bkm-on-xeon)

I install PyTorch 1.0 with pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html. It’s built without MKLDNN, right? I didn’t notice any performance boost just by switching to this version from 0.4.

I just built PyTorch 1.0 (stable) with MKLDNN and it doesn’t help.

UPD 12/9: Tracing doesn’t help either =(