PyTorch unable to fully utilize cpu when performing SVD

When performing SVD of a large matrix, I find that PyTorch is unable to utilize all cores of the CPU of my machine (Apple M1 Pro, 6+2 CPU cores. I am using PyTorch 2.1.1).

import torch
torch.set_default_dtype(torch.float64)

a = torch.rand((1024, 1024), dtype=torch.complex128)
u, s, v = torch.linalg.svd(a, full_matrices=False)

PyTorch can only use about 230% CPU, and %timeit reports 1.3s execution time of the SVD. However, if I run similar code in NumPy, the CPU usage can go up to about 550% (but curiously the execution time is about the same as PyTorch).

import numpy as np

a = np.random.rand(1024, 1024) + 1.0j * np.random.rand(1024, 1024)
u, s, v = np.linalg.svd(a, full_matrices=False)

Julia can also use all cores. @btime with BenchmarkTools report 1.04s of execution time.

using LinearAlgebra

a = rand(ComplexF64, (1024, 1024))
u, s, v = svd(a)

How to make PyTorch utilize all cores and possibly speed up the SVD process?