Function ‘LinalgSvdBackward0’ returned nan values in its 0th output

    @staticmethod
    def PSVT(X, tau, r=50):
        # X_1 = X + (torch.eye(X.size(0)) * 1e-5).cuda()
        [U, S, V] = torch.svd(X)
        V = torch.t(V)
        Xd = torch.mm(torch.mm(U[:, 0:r], torch.diag(torch.complex(S[0:r], torch.tensor(0.0).cuda()))), torch.conj(torch.t(torch.t(V)[:, 0:r])))
        diagS = nn.functional.relu(S[r:] - tau)
        diagS = torch.squeeze(diagS[torch.nonzero(diagS)])
        svp = np.prod(list(diagS.shape))
        if svp >= 1:
            diagS = torch.complex(diagS, torch.zeros_like(diagS))
            Xd = Xd + torch.mm(torch.mm(U[:, r:r + svp], torch.diag(diagS)), torch.conj(torch.t(torch.t(V)[:, r:r + svp])))
        return Xd
    
    def forward(self, t1, tau):
        output = torch.zeros(self.shape).cuda()
        P_fft_svt = torch.zeros(self.shape, dtype=torch.complex64).cuda()
        for b in range(self.batchsize):
            P = t1[b]
            P_fft = torch.fft.fft(P, dim=0)
            for c in range(self.channel):
                P_fft_svt[b,c,:,:] = self.PSVT(P_fft[c,:,:], tau)
            output[b,:,:,:] = torch.fft.irfft(P_fft_svt[b,:,:,:], n=3, dim=0)
        return output

I’m using a truncated svd decomposition on the input variable Z. The loss function is the mse of the reconstructed Z and the groundtruth, which leads to a RuntimeError: function ‘LinalgSvdBackward0’ returned nan values in its 0th output.
This problem has been explained by KFrank, but due to my fault that I ask this question in the other topic, so I create a new one.
I felt bad for the mistake I made due to my lack of understanding of the community rules.
Below is the old topic.

Function ‘LinalgEighBackward0’ returned nan

But I still have some problems about this: in my network, the input t1 equals to W - Q/u, and tau equals to 1/u, and only the u is the parameter which needs to learn, W and Q are inited by torch.zeros, so Pytorch seems not to take gradients with respect to U or V.
I would like to ask you for guidance on this issue. Thank you.

Hi Aya!

The return value of PSVT() depends on svd()'s U and V.

P and therefore P_fft depend on t1 (which you say depends on u), so
when you pass P_fft to PSVT(), U and V inside of PSVT() depend on
t1 (and hence u).

As noted in my post in the linked thread, this can happen if the matrix passed
to svd() has degenerate singular values.

As explained above, the output of your forward() function depends
on its input t1 (and hence u) through the call to PSVT() and, internal to
PSVT(), on svd()'s U and V matrices.

So when autograd uses the chain rule to compute the gradient of some_loss
with respect to u when you call some_loss.backward(), autograd does take
gradients with respect to U and V.

You have to reformulate your problem somehow so that (near) degenerate
singular values don’t appear in your argument to svd() or so that the result
doesn’t depend on U nor V. (Your result may depend on the singular values,
S, themselves, because those gradients don’t diverge, but not on U nor V.)

(As an aside, you should upgrade your code to use torch.linalg.svd()
because torch.svd() is deprecated.)

Best.

K. Frank