I’m trying to verify that my code (CPU only) is utilizing all available computation resources. Does torch.get_num_threads give the number of threads that are potentially available or that are actually used by the current part of the code?
In the latter case, where should I print its return value for diagnosis i.e. which parts of PyTorch code should be using cpu parallelism when it is available?
It prints the total number of threads that are usable (they may or may not be used at the moment).
Only heavy cpu computations (mm, element-wise ops, linear algebra…) operations use that at the moment.
You need to be carefull which BLAS library you use to be sure to get the best possible performances from CPU code. OpenBlas or mkl are recommanded.
There is no way that I know of. Mostly because the multi-thread usage is done by OpenMP and similar libraries that are not directly controlled by pytorch but by other libraries that we use.