Torch.cholesky_solve is slower than torch.solve

For a positive definite symmetric matrix A, the linear equation


can be solved either by torch.cholesky_solve or by torch.solve. We might expect torch.cholesky_solve is faster because it uses the positive definite property. However, torch.cholesky_solve is slower than torch.solve. For a matrix of shape 128*128, the time consuming is:

torch.cholesky_solve 10.0ms
torch.solve 7.03ms

(NVIDIA GPU Titan X performance)

Why? So how to achieve the best performance? Thank you!