Hello everybody! I have the following problem: Given an arbitrary non-square matrix W, i need to get the k singular vector pairs corresponding to the k largest singular values.
An obvious way of doing so is to just compute the entire SVD and then only take the k first. This is however very inefficient when k is small, say k = 1. Is there any way to get these vector pairs more efficiently?
You should be able to leverage torch.lobpcg(), which calculates the k
largest eigenvalues of a symmetric-positive-definite matrix.
Let’s assume that W is an m x n matrix, with m <= n. Then W @ W.T
will be an m x m symmetric-positive-semidefinite matrix. I imagine that
you can fudge this by adding a small epsilon to the diagonal, if needed*.
torch.lobpcg() will give you the top k eigenvalues of W @ W.T, which
are the squares of the top k singular values of W (In fact, this is a
common definition of the singular values.), together with the associated
eigenvectors.
*) I haven’t tried it, but my intuition suggests that if none of the k singular
values of W that you ask for are zero, the lobpcg() algorithm might still
work without the epsilon fudge, even if W does have some singular values
that are zero.
Hey @KFrank, thanks for the fast reply. This was actually the first thing I tried, but it seems to be relatively slow, but at least the answer seems to be exact. I resorted to a power iteration approach which gives me the first singular-pair in reasonable time, however it would be nice to use pytorch-optimized functions only, I still rely on a for loop (which is by the way the same approach the spectral norm does in pytorch).
I wonder whether its possible to e.g. modify the torch SVD code directly to fit my needs, however it seems like its in C and not in Python/Pytorch.
Does it seem slower than it ought to be, or is it just slower than what
you would like for your use case? (In general, pytorch does a good job
with these sorts of things, but I wouldn’t rule out a bug or unnecessary
inefficiency, and if there is one, it would be worth tracking down.)
There are two probable issues with doing so:
First, the algorithm for calculating a small number of singular values (or,
analogously, eigenvalues) is rather different than that for performing a full
singular-value decomposition (or, analogously, eigendecomposition).
So starting with pytorch’s SVD code would likely lead you to writing an
algorithm that would not be efficient when calculating just a few singular
values.
Second, do you need to backpropagate through your top few singular
vectors? If so, I expect that you would also have to implement the gradient
(backward()) companion to your singular-vector calculation. This is
because I doubt very much that pytorch’s SVD algorithm is written using
differentiable pytorch tensor operations. Therefore neither pytorch’s SVD
algorithm, nor your slimmed-down “top-k” version will get autograd “for
free.” Instead, you would have to implement the gradient computation
explicitly.
It is significantly slower than using a power iteration (which is approximative though). I was thinking of integrating the methods of Scipy into Pytorch, however they seem to be using the same algorithm in this naive way, i.e. by transforming matrix W into symmetric pos-def. matrix W.t()*W and the computing the eigenpairs.
You are probably right. Right now I am using this in a research context, so I will be fine by using the full SVD as implemented in Pytorch, and then selecting only the top-k entries. This obviously does not yield the speedup, but at least gives me correct results.
No, I don’t need gradients at all, actually the SVD is performed on the gradient of my parameters.