How can I handle this error "svd_cuda: the updating process of SBDSDC did not converge"

S_M · March 2, 2022, 7:20pm

Hi all
While I am using the following code on Cuda, where A and B are 2D tensors with shape [128,128].
C = A.pinverse().matmul(B)
I get the error, svd_cuda: the updating process of SBDSDC did not converge.
how can I handle this error?
Any help would be appreciated.

JuanFMontesinos · March 3, 2022, 2:14am

Well I would say it probably means your matrix is not invertible…?

You can check
https://www.planetmath.org/Pseudoinverse

S_M · March 3, 2022, 3:59am

Do you mean by its dimensions nbyn is not investable? Or is it related to entities ‘I mean the vales’ of the matrix?
What is the solution? How can l compute the inverse of the matrix A?
Actually, I need the matrix C to calculate eigenvalues and eigenvectors

JuanFMontesinos · March 4, 2022, 4:16pm

I mean, you should check that the matrix you want to invert accomplish the theoretical properties of an invertible matrix.

It looks like in your case, your matrix doesn’t. Moreover if A and B are square you can compute its inverse directly.
https://pytorch.org/docs/stable/generated/torch.linalg.inv.html#torch.linalg.inv

KFrank · March 5, 2022, 12:20am

Hi S!

First, to be clear, this is not about whether A is invertible. The whole
purpose of the pseudo-inverse (torch.pinverse()) is to perform a
mathematically useful operation even when the matrix is not invertible.
(And the pseudo-inverse coincides with the regular matrix inverse
(torch.inverse()) when the matrix is invertible.)

As to your specific problem, let me first note that I cannot reproduce
your error exact message (but I can cause other errors).

But things to try:

First check that A does not contain 'inf’s nor 'nan’s (A.isinf().any(),
A.isnan().any()). That could lead to a pinverse() error.

Try upgrading to the latest stable pytorch version or the nightly build.
(By the way, which pytorch version are you using?)

Try using pinverse()'s rcond argument. You will have to experiment
with the value, e.g., A.pinverse (rcond = 1.e-2), and see whether
larger values eliminate your error.

Replace your specific usage of pinverse() (namely multiplying B) with
C = torch.linalg.lstsq (A, B).solution.

Try perturbing A before calculating its pseudo-inverse:
A += 1.e-3 * torch.randn (A.shape). Again, play around with
the scale factor.

Perform the call to pinverse() on the cpu. It will be a pain if you need
to backpropagate through this step, because you will have to copy your
intermediate gradients over to the cpu, backpropagate on the cpu, and
then copy them back to the gpu.

Here is an illustration of some of these points:

>>> import torch
>>> torch.__version__
'1.10.0'
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> A = torch.randn (5, 5, device = 'cuda')
>>> Az = m.clone()
>>> Az[:, 4] = 0
>>> Ai = m.clone()
>>> Ai[2, 3] = float ('inf')
>>>
>>> A
tensor([[-0.4984,  1.0116, -0.2725, -2.4033, -0.0222],
        [ 1.0673,  0.9031, -0.7990,  2.6576, -1.7713],
        [ 0.2904,  0.1456, -0.0956, -2.0614, -0.5158],
        [-0.0729, -0.0587,  0.0698,  0.0745,  0.8687],
        [-2.3846,  2.1474, -0.1243,  0.6857, -0.0301]], device='cuda:0')
>>> A.inverse()
tensor([[-2.7031e+00,  6.0249e-01,  4.3399e+00,  3.7649e+00,  8.2871e-01],
        [-3.4336e+00,  6.6966e-01,  5.5499e+00,  4.6267e+00,  1.5518e+00],
        [-9.0598e+00,  5.3106e-01,  1.2691e+01,  8.5048e+00,  3.4168e+00],
        [-2.7618e-01,  9.6305e-02, -4.5012e-04,  1.9232e-01,  9.4890e-02],
        [ 2.9279e-01,  4.4850e-02, -2.8051e-01,  1.0797e+00, -1.0827e-01]],
       device='cuda:0')
>>> torch.allclose (A.pinverse(), A.inverse(), atol = 1.e-6)
True
>>>
>>> Az
tensor([[-0.4984,  1.0116, -0.2725, -2.4033,  0.0000],
        [ 1.0673,  0.9031, -0.7990,  2.6576,  0.0000],
        [ 0.2904,  0.1456, -0.0956, -2.0614,  0.0000],
        [-0.0729, -0.0587,  0.0698,  0.0745,  0.0000],
        [-2.3846,  2.1474, -0.1243,  0.6857,  0.0000]], device='cuda:0')
>>> Az.inverse()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: inverse_cuda: (Batch element 0): The diagonal element 5 is zero, the inversion could not be completed because the input matrix is singular.
>>> Az.pinverse()
tensor([[-3.1374,  0.5360,  4.7560,  2.1632,  0.9893],
        [-3.9337,  0.5931,  6.0290,  2.7825,  1.7368],
        [-9.6315,  0.4435, 13.2387,  6.3964,  3.6282],
        [-0.3025,  0.0923,  0.0248,  0.0951,  0.1046],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]], device='cuda:0')
>>>
>>> Ai
tensor([[-0.4984,  1.0116, -0.2725, -2.4033, -0.0222],
        [ 1.0673,  0.9031, -0.7990,  2.6576, -1.7713],
        [ 0.2904,  0.1456, -0.0956,     inf, -0.5158],
        [-0.0729, -0.0587,  0.0698,  0.0745,  0.8687],
        [-2.3846,  2.1474, -0.1243,  0.6857, -0.0301]], device='cuda:0')
>>> Ai.pinverse()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: svd_cuda: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 6).
>>>
>>> B = torch.randn (5, 5, device = 'cuda')
>>> torch.allclose (A.pinverse().matmul (B), torch.linalg.lstsq (A, B).solution, atol = 1.e-6)
True

Good luck.

K. Frank