1- You can add small amount of noise to your matrix

2- Since nan are for numbers that are very large or small you can use torch.nan_to_num

3- you can use linalg.svd(X.cpu().detach().numpy(), full_matrices=False, lapack_driver=“gesvd”), which I believe still have problem but better than torch.svd

The gradient of an SVD decomposition is only defined if the singular values aren’t arbitrarily close together. There’s more detail about this shown in the documentation here.