Nan in U@torch.diag(S)@V.t() for huge data

Ahmad_Khan · September 26, 2021, 7:13am

I am trying to get back my input using SVD, getting nan values in U@torch.diag(S)@V.t()

AlphaBetaGamma96 · September 26, 2021, 10:30am

Do you mind sharing a bit more about your data?

Also, check whether you’re using torch.svd or torch.linalg.svd they return different matrices.

Ahmad_Khan · September 26, 2021, 2:39pm

A set of 1000 224*224 images of imagenet, using torch.svd

AlphaBetaGamma96 · September 26, 2021, 3:04pm

Would you mind sharing some code so it’s easier for me understand where bug appears?

Also, check if there are any NaNs in your data (via: torch.isnan(x).any()) before applying torch.svd. If you’re normalising your data check the variance is non-zero, because that’ll introduce a divide by zero error. And, I assume you’re doing all 1000 images in one go? So, you need to use torch.diag_embed(S) rather than torch.diag(S) instead!

Ahmad_Khan · September 27, 2021, 4:43am

Normalising the image by subtracting the mean image (of all the image), from each image.
X = images.reshape(nImage, h*w)
Norm_image = X - mean_image
U, S, V = torch.svd(torch.from_numpy(Norm_image.astype(np.double)).cuda(), some=False)
Checked that variance is non zero.

Ahmad_Khan · September 28, 2021, 5:08am

@ptrblck any solution for this.

AlphaBetaGamma96 · September 29, 2021, 11:08pm

Sorry, I didn’t get a ping for this. Could you check if any of the images are full of zeros? (after you apply the mean-centering?). And, check there are no infs or NaNs in your image before preconditioning.

images = torch.from_numpy(Norm_image.astype(np.double)).cuda() #shape [1000,255,255]

print(torch.isnan(images).any())
print(torch.isinf(images).any())

Also, could you try the following code and see if the NaN issue goes away?

#use torch.linalg.svd instead of torch.svd
#this returns V already transposed (so more efficient)
U, S, VT = torch.linalg.svd(images) 
images_from_svd = U @ S.diag_embed() @ VT

Is this reshape right, perhaps you could be introducing the NaN when mean-centering your images? If Images is shape [B,W,H] you could just do image_mean = images.mean(dim=(-1,-2)) then image_norm = image - image_mean (which is then passed to torch.linalg.svd(image_norm))

Also, remember to write code wrapped around 3 backticks ``` so it’s shown nicely!

Ahmad_Khan · September 30, 2021, 4:12am

The reason why I moved to torch.svd is because of gpu memory limit, torch.linalg.svd was not able to load the data . Is it possible to run it in data parallel mode.

AlphaBetaGamma96 · September 30, 2021, 2:25pm

Did you check the other things I mentioned?

You can just do images_from_svd comment I had above but just remember to transpose V