About of torch.svd error

charlotte9 · July 5, 2022, 8:33am

code: U, S, V = torch.svd(torch.mm(src_vec, tgt_vec.t()))
sinpa = torch.sqrt(1 - torch.pow(cospa,2))
rsd_loss = torch.norm(sinpa,1)

When I train model ,and I use rsd_loss as loss values, pytorch always give me a error:
Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 23)
Even I moved the Tensors to cpu but still getting the same issue.

Then I found that after training twenty or thirty epoch, Nan and INF appeared in the gradient, so the weight of update also changed to nan, and all the features obtained by network changed to Nan, which were passed into torch.svd(). SVD calculation reported the above error.
Some say it‘s a pytorch version problem, and I have tried several version:1.10.0、1.7.0、0.4.1，I get same error as above.
How should this problem be solved？ Which version of pytorch should I use？