While evaluating a trained Pytorch model on CPU only, the inference runs very slowly. The model is quite small, and using torch.set_num_threads(1)
resolves the issue and speeds up inference immediately. If I do not set that, however, Pytorch uses up all available CPU cores (>16) for inference of a relatively small model, and this, along with being totally unnecessary, in fact slows down inference by a huge amount.
My question is whether I am doing something incorrectly, or having to use torch.set_num_threads
is expected behavior in such cases.
My server runs Ubuntu 16.04, Pytorch version 0.3.1 installed via pip
Edit: Not sure if relevant, but model has 1 GRU layer (apart from 3 conv layers and 2 FC). Also, torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1))
returns True