Hello,
I have been trying to make a CPU accelerated inference for our model. The only thing I change from GPU to CPU is torch.set_num_threads(threads)
. I am running the model on AWS m5.24xlarge
with 96 CPUs. I set the num_threads to 86 and num_workers to 8. It starts pretty well, giving an ETA of ~16hours but after 40% completion, most of the pytorch processes start jumping between “S-R” (stall and running) states. It got stuck at 47%, and I had to kill it.
Should I do something else for a CPU-accelerated version?
This is on Ubuntu 18.04 (the deeplearning AMI available from AWS)
Python3: 3.6.7
PyTorch: 1.1
The model is here: https://github.com/kishwarshafin/helen/blob/master/modules/python/models/TransducerModel.py
And the way I am doing is here:
Any help would be highly appreciated.