Solving linear equations is very slow

Hi,
I have encountered the same questions : GPU computing is slower CPU When I utilize the BLAS/LAPACK operations such as torch.solve in PyTorch.
I wonder if the framework only accelerate the general tensor operations (element-wise / tensor contraction / indexing/… operations) by GPU, but for special function / methods (e.g. various matrix decomposition), the improvement of performance is limited, which needs further configuration of GPU computing?