OpenMP not loading when using c++ via pybind

Hi, I am trying to implement a custom operator using C++ to work with PyTorch, via pybind. However, when I do ‘python setup.py install’ (i.e., build the C++ code with ninja), I get this warning:

/home/karthik/anaconda3/envs/pytorch-lib/lib/python3.9/site-packages/torch/include/ATen/ParallelOpenMP.h:87: warning: ignoring ‘#pragma omp parallel’ [-Wunknown-pragmas]
   87 | #pragma omp parallel for if ((end - begin) >= grain_size)

The thing is I do want to use OpenMP with my code to speed it up. Is there some way to fix this so openMP gets included when I build?

And a related question, can I just add a “#pragma omp parallel for” in front of the loop or do I have to use the “at::parallel_for” function? Just in case the regular pragma causes any other issues with PyTorch. Thank you!