I’m suffering from the backward of the SVD thresholding method, which always brings out the NAN errors due to it removes the small singular values to keep a Low-rank structure. Pytorch warns that if a Low-rank matrix or tensor, whose singular value list S
is thresholded by some certain tau
thresholding operator, the gradients would get NANs due to similar singular values or zeros (see warnings in torch.linalg.svd document).
From this blog Modify gradient computation of torch.linalg.svd
, I learn that I should add a torch.eps
function to the bottom of the 1/(sigma_i**2 - sigma_j**2)
.
But I have some questions about this:
- Do this ‘adding method’ cause the imprecise Low-rank results?
- And what do you think is the reason Pytorch doesn’t include this mechanism (Because as I understand it, as long as backpropagation about SVD is used, there will be such a problem.)?
- How can I figure this Low-rank problem out in a more elegant way?
I would really appreciate it if you can help me out.