Inference extremely slow on AWS (CPU)?

marksaroufim · February 10, 2022, 9:57pm

Yeah so there’s a lot of things you can do to make CPU inference faster, sorted by increasing complexity

Using these kinds of tricks I managed to get BERT inference from 2s to roughly 20ms

3 -4 are integrated in torchserve if you’re interested https://github.com/pytorch/serve