Using libtorch in OpenMP gets wrong results

lymhust · February 18, 2021, 9:24am

I want to use libtorch functions in OpenMP. However, there are some problems:

Only the last thread returns correct results.
Other threads get wrong results.

Has anyone encountered the same problem? Thanks!

lymhust · February 19, 2021, 3:58am

Finally, I found that maybe it is caused by set_num_interop_threads.
For earlier libtorch (e.g. libtorch 1.0.0, libtorch 1.1.0), there are no functions like at::set_num_interop_threads or torch::set_num_interop_threads, and only one thread works when calling libtorch functions in OpenMP.
From libtorch 1.2.0, we can find at::set_num_interop_threads in <ATen/Parallel.h> or torch::set_num_interop_threads in <torch/utils.h> in later versions, and the final results are correct.

xym2009 · August 4, 2021, 11:44am

I’ve also used libtorch (GPU) functions in OpenMP, and I notice the computation time is not linearly reduced as CPU functions. For example, a function need 1000ms in one core, and 100ms using OpenMP with 10 cores; however, for GPU, a function may need 100ms, while using OpenMP with 10 cores, the time may be 80ms (the GPU memory is enough), do you have any idea about this problem? Thanks a lot!