Using libtorch in OpenMP gets wrong results

I want to use libtorch functions in OpenMP. However, there are some problems:

  1. Only the last thread returns correct results.
  2. Other threads get wrong results.

Has anyone encountered the same problem? Thanks!

Finally, I found that maybe it is caused by set_num_interop_threads.
For earlier libtorch (e.g. libtorch 1.0.0, libtorch 1.1.0), there are no functions like at::set_num_interop_threads or torch::set_num_interop_threads, and only one thread works when calling libtorch functions in OpenMP.
From libtorch 1.2.0, we can find at::set_num_interop_threads in <ATen/Parallel.h> or torch::set_num_interop_threads in <torch/utils.h> in later versions, and the final results are correct.

I’ve also used libtorch (GPU) functions in OpenMP, and I notice the computation time is not linearly reduced as CPU functions. For example, a function need 1000ms in one core, and 100ms using OpenMP with 10 cores; however, for GPU, a function may need 100ms, while using OpenMP with 10 cores, the time may be 80ms (the GPU memory is enough), do you have any idea about this problem? Thanks a lot!