Why can not I take full advantage of the CPU cores?

Problem Description:
Today I want to test the CPU utilization of pytorch matrix multiplication.So I run the following benchmark:

import timeit
runtimes = []
threads = [1] + [t for t in range(2, 49, 2)]
for t in threads:
    torch.set_num_threads(t)
    r = timeit.timeit(setup = "import torch; x = torch.randn(1024, 1024); y = torch.randn(1024, 1024)", stmt="torch.mm(x, y)", number=100)
    runtimes.append(r)

Howerver I found a weird problem : CPU cores are very unevenly loaded.One CPU core has a very high load and the others have a very low load.Just like this

I compile the pytorch of version 1.4.1 with gcc/7.4.0 and I set the environment variable USE_OPENMP=1.What caused this ? Please tell me ! Thanks!