Inference extremely slow on AWS (CPU)?

Yeah so there’s a lot of things you can do to make CPU inference faster, sorted by increasing complexity

  1. Use an m6i or c6i instance they are muuuch faster
  2. Set torch.set_num_threads(1) in your script
  3. Use Intel IPEX
  4. Use Intel CPU launcher script

Using these kinds of tricks I managed to get BERT inference from 2s to roughly 20ms

3 -4 are integrated in torchserve if you’re interested https://github.com/pytorch/serve