Why does conv2d() use kernel space when there is only one thread?

Hi all,
I am currently in touch with the use of pytorch. When I was running the convolutional neural network code, I tested the conv2d() function on the CPU. I used “top” to check performance.
When I set the number of threads equal to 1, I found that “%sy” in the information printed by “top” reached 40-60% (sy: system cpu time (or)% CPU time spent in kernel space).
Is it normal? And can anyone explain why the kernel space is used?

cpu_num = 1 
print("cpu_num: ",cpu_num)
os.environ ['OMP_NUM_THREADS'] = str(cpu_num)
os.environ ['OPENBLAS_NUM_THREADS'] = str(cpu_num)
os.environ ['MKL_NUM_THREADS'] = str(cpu_num)
os.environ ['VECLIB_MAXIMUM_THREADS'] = str(cpu_num)
os.environ ['NUMEXPR_NUM_THREADS'] = str(cpu_num)
torch.set_num_threads(cpu_num)

Many thanks!