CPU Inference is very slow, unnecessarily occupies multiple CPU cores

While evaluating a trained Pytorch model on CPU only, the inference runs very slowly. The model is quite small, and using torch.set_num_threads(1) resolves the issue and speeds up inference immediately. If I do not set that, however, Pytorch uses up all available CPU cores (>16) for inference of a relatively small model, and this, along with being totally unnecessary, in fact slows down inference by a huge amount.

My question is whether I am doing something incorrectly, or having to use torch.set_num_threads is expected behavior in such cases.

My server runs Ubuntu 16.04, Pytorch version 0.3.1 installed via pip

Edit: Not sure if relevant, but model has 1 GRU layer (apart from 3 conv layers and 2 FC). Also, torch.backends.cudnn.is_acceptable(torch.cuda.FloatTensor(1)) returns True


I am afraid this is possibly expected behaviour. The tuning of the threshold when multithreading should be used or not on CPU is qui complex to tweak for the wide range of architecture, core count…
Setting manually torch.set_num_threads() is the right option if you know how many cores your program should use. Especially when working on cpu with large core count, the thresholds can be low.

If by any chance you can provide a simple code sample to reproduce this, we can check if the involved threshold is actually correct or if it is an error.