I have been trying to make a CPU accelerated inference for our model. The only thing I change from GPU to CPU is torch.set_num_threads(threads). I am running the model on AWS m5.24xlarge with 96 CPUs. I set the num_threads to 86 and num_workers to 8. It starts pretty well, giving an ETA of ~16hours but after 40% completion, most of the pytorch processes start jumping between “S-R” (stall and running) states. It got stuck at 47%, and I had to kill it.
Should I do something else for a CPU-accelerated version?
This is on Ubuntu 18.04 (the deeplearning AMI available from AWS)