I’m trying to run inference on a trained model using an AWS EC2 instance (specifically, the c5 series) using CPUs for inference. However, I noticed that the forward pass is extremely slow on CPU when compared to even just running inference on my laptop (2018 Macbook Pro, also using CPU). E.g., the entire forward pass takes 0.05 s on my laptop, but a single line in the forward pass (a call to conv2d) takes 0.15 s on the AWS instance.
I’m using pytorch 1.3.1 and based my AMI on the Deep Learning AMI.
Any ideas why it is so slow, or what I could do to speed it up? (Or, if I should just use an instance with GPUs?)
Edit: I tried timing with and without MKL (like here: Use MKLDNN in pytorch - #4 by LeviViana), and with MKL was actually slower than without it.