Problem Description:
Today I want to test the CPU utilization of pytorch matrix multiplication.So I run the following benchmark:
import timeit
runtimes = []
threads = [1] + [t for t in range(2, 49, 2)]
for t in threads:
torch.set_num_threads(t)
r = timeit.timeit(setup = "import torch; x = torch.randn(1024, 1024); y = torch.randn(1024, 1024)", stmt="torch.mm(x, y)", number=100)
runtimes.append(r)
Howerver I found a weird problem : CPU cores are very unevenly loaded.One CPU core has a very high load and the others have a very low load.Just like this
I compile the pytorch of version 1.4.1 with gcc/7.4.0 and I set the environment variable USE_OPENMP=1.What caused this ? Please tell me ! Thanks!