Scaling linalg ops

While working on linear inverse and matrix determinant problems, I’ve found that some of pytorch’s matrix factorization algorithms scale well to “batch” mode (i.e. batches of matrices vs. single matrices) and others do not. It would be useful to have this documented somewhere for users, or at least some general information on what to expect.

My findings so far:

  1. Cholesky factorization scales well to batch (although only applicable for sym PD matrices)
  2. LU factorization scales well to batch (e.g., torch.solve, torch.linalg.solve)
  3. Symmetric eigendecomposition does not scale to batch (neither torch.symeig(*, eigenvectors=True) nor torch.linalg.eigh)
  4. QR factorization does not scale to batch (neither torch.qr nor torch.linalg.qr)
  5. torch.lstsq has no batch implementation (I believe this uses QR under the hood?)

Other thoughts:

A) LDL factorization
I can’t find an implementation of LDL factorization or LDL-based solvers. This is usually the 2nd-best choice after Cholesky for sym matrices (i.e. when they are not PD). It would be great to see an implementation in the future.

B) Least-squares and least-norm
Currently torch.lstsq has no batch implementation. Given that Cholesky scales so well, it would make sense to offer a Cholesky-based method that solves A^T A x = A^T b (with diagonal Tikhonov regularization for the least-norm case of m < n). This is a classical approach in the literature.

In general what I mean by “scale well” is that the algorithm scales in constant time, as though all batch entries were run in parallel, vs. linear time, as though they were run sequentially.