PCA using torch.svd() not converging error

Harsha_1412 · July 15, 2020, 9:37am

I was trying to do PCA using PyTorch, to reduce dimensionality.
I am getting the following error

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 17)

This is occurring even if convert my tensor to cpu().
Also when I try removing NaNs in the input to torch.svd(), there is no error but loss(CrossEntropy) is NaN.

ptrblck · July 16, 2020, 5:38am

I would recommend to first try to fix the NaN loss.
Do your model outputs or targets contain any NaNs and are you expecting it during training?

Harsha_1412 · July 16, 2020, 11:51am

No neither my model outputs nor targets contain any NaNs.I am not expecting it during training.I also tried gradient clipping but that made no difference.

ptrblck · July 16, 2020, 11:25pm

Where do the NaNs in the input to svd come from, if neither from the output nor from the target?
How is node_embeddings calculated?

Harsha_1412 · July 17, 2020, 3:55am

‘‘node_embeddings’’ is the output of a graph neural network, of shape (n x f) where n is the no. of nodes in a particular graph and f is the no.of features.n is around 30 and f is around 250 and I want to reduce f to 100.
The experiment runs successfully if I don’t do svd. That is if I directly pass all features through the subsequent fully connected layers, no errors occur, no NaNs are encountered.Only when I try to reduce f using PCA and then process further, this issue is occurring.

ptrblck · July 17, 2020, 11:08am

I might have misunderstood the issue, but in the first post you claim, that you tried to "remove the NaNs in the input to torch.svd()".
Based on your description now it seems that you are suddently getting NaNs in the model output, if you are trying to apply svd?

Harsha_1412 · July 17, 2020, 11:45am

I “suspected” there might be NaNs and so zeroed them out just to be safe.
But then figured out that I am suddenly getting NaNs in the model output, when I am trying to apply svd.