Is it possible to use multi-cpu acceleration for tensor operations in Jupyter Notebook?

I am developing in a Jupyter Notebook. I am trying to see whether I can use all 48 cores/96 threads of my CPU to accelerate tensor operations in pytorch since task manager is showing only about 2% cpu utilization when I am running tensor operations like matrix multiplication.

I am working in a Jupyter Notebook and trying to check whether PyTorch can use all 48 cores with 96 threads to speed up tensor operations like matrix multiplication. However, even when I compare performance between a single core and multiple cores, the benchmark runtime stays almost the same. The task manager also shows only about 2 percent CPU usage during the operation.

It makes me wonder whether Jupyter Notebook is limiting multiprocessing, or if there is something else I’m missing. Can I get some guidance on this?However, no matter what I do, I can’t speed up the tensor operation when I compared the runtime between 1 cpu vs. multi. I think maybe Jupyter Notebook itself isn’t capable of using multicpu. Can I ask for help about this?

Could you describe how exactly you are comparing the single vs. multi core execution?

Hello!

I was trying to run it like this on a Jupyter Notebook:

import torch
import time
import os

print(“Logical CPUs:”, os.cpu_count())

N = 4096
A = torch.randn(N, N)
B = torch.randn(N, N)

def benchmark(num_threads, runs=10):
torch.set_num_threads(num_threads)

# Warm-up
for _ in range(3):
    _ = A @ B

start = time.perf_counter()
for _ in range(runs):
    _ = A @ B
end = time.perf_counter()

return (end - start) / runs

t1 = benchmark(1)
t24 = benchmark(24)

print(f"1 thread: {t1:.6f} s")
print(f"24 threads:{t24:.6f} s")
print(f"Speedup: {t1 / t24:.2f}x")

The output is this:

Logical CPUs: 96
1 thread: 0.936561 s
24 threads:0.937496 s
Speedup: 1.00x