Inference extremely slow on AWS (CPU)?


I’m trying to run inference on a trained model using an AWS EC2 instance (specifically, the c5 series) using CPUs for inference. However, I noticed that the forward pass is extremely slow on CPU when compared to even just running inference on my laptop (2018 Macbook Pro, also using CPU). E.g., the entire forward pass takes 0.05 s on my laptop, but a single line in the forward pass (a call to conv2d) takes 0.15 s on the AWS instance.

I’m using pytorch 1.3.1 and based my AMI on the Deep Learning AMI.

Any ideas why it is so slow, or what I could do to speed it up? (Or, if I should just use an instance with GPUs?)

Edit: I tried timing with and without MKL (like here: Use MKLDNN in pytorch - #4 by LeviViana), and with MKL was actually slower than without it.


Yeah so there’s a lot of things you can do to make CPU inference faster, sorted by increasing complexity

  1. Use an m6i or c6i instance they are muuuch faster
  2. Set torch.set_num_threads(1) in your script
  3. Use Intel IPEX
  4. Use Intel CPU launcher script

Using these kinds of tricks I managed to get BERT inference from 2s to roughly 20ms

3 -4 are integrated in torchserve if you’re interested

You can get more information about IPEX from the github link below. IPEX currently works with PyTorch 1.10 release.

And here is the documentation site of it: