How can I compile source code with openmp?And why is the openmp compiled with gcc so inefficient?

I compile pytorch(1.5.0) source code with the following setting:
GCC-7.4.0
cmake-3.10.3
python3.7.6

And I export the following environment variables:
export USE_CUDA=0
export USE_OPENMP=1

I build it successfully but I found it can not use multi CPU cores(or make full use of CPU cores?).I set different number of threads and the execution time has not changed.Just like this:

And when I test the utilization of CPU,I found that only one core was 100 percent utilized, and all the others were 3 to 4 percent utilized.

Then I use pip to install pytorch and I find the situation is different.What caused this ?
I print the DLLs:
compile source code:

pip install:

The lipgomp.so is different and Is it the reason?