I sometimes get nans in matrices when calling torch.linalg.cholesky. Other times, I get an error message that the matrix is not PD. Under what conditions does cholesky return nan vs throwing a PD error?
I have the same issue.
When I run torch.linalg.cholesky with the same hessian on different GPU server, the behavior is different.
For a6000 GPU, it just return matrix with NaN.
For RTX8000 GPU, it show error message: "LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).
"
Is it the bug that torch.linalg.cholesky sometimes doesn’t catch non PD matrix and just generate NaN matrix?
Thanks,
Po-Yu Huang
Based on this small testing below:
import torch
def test_cholesky_decomposition():
results = []
# Test 1: Valid positive definite matrix
A = torch.tensor([[2.0, 1.0], [1.0, 2.0]])
try:
L = torch.linalg.cholesky(A)
results.append(("Valid PD Matrix", torch.allclose(torch.mm(L, L.t()), A)))
except Exception as e:
results.append(("Valid PD Matrix", f"Failed: {str(e)}"))
# Test 2: Not positive definite matrix
A = torch.tensor([[1.0, 2.0], [2.0, 1.0]])
try:
torch.linalg.cholesky(A)
results.append(("Not PD Matrix", "Failed to raise error"))
except Exception as e:
results.append(("Not PD Matrix", "not positive-definite" in str(e)))
# Test 3: Matrix with NaN
A = torch.tensor([[1.0, float('nan')], [float('nan'), 1.0]])
try:
L = torch.linalg.cholesky(A)
results.append(("Matrix with NaN", torch.isnan(L).all()))
except Exception as e:
results.append(("Matrix with NaN", f"Raised exception: {str(e)}"))
# Test 4: Almost singular matrix
A = torch.tensor([[1e-8, 0], [0, 1e-8]])
try:
L = torch.linalg.cholesky(A)
results.append(("Almost Singular Matrix", not torch.isnan(L).any()))
except Exception as e:
results.append(("Almost Singular Matrix", f"Raised exception: {str(e)}"))
# Test 5: Negative diagonal
A = torch.tensor([[1.0, 0], [0, -1.0]])
try:
torch.linalg.cholesky(A)
results.append(("Negative Diagonal", "Failed to raise error"))
except Exception as e:
results.append(("Negative Diagonal", "not positive-definite" in str(e)))
# Test 6: GPU behavior (if available)
if torch.cuda.is_available():
A = torch.tensor([[1.0, 0.5], [0.5, 1.0]], device='cuda')
try:
L = torch.linalg.cholesky(A)
results.append(("GPU Behavior", torch.allclose(torch.mm(L, L.t()), A)))
except Exception as e:
results.append(("GPU Behavior", f"Failed: {str(e)}"))
else:
results.append(("GPU Behavior", "CUDA not available"))
# Print results
for test_name, result in results:
print(f"{test_name}: {result}")
# Run the tests
test_cholesky_decomposition()
My understanding is that the nan matrix is just another Non-PD matrix to it.
Hi @Soumya_Kundu,
Thanks for your great testing code.
Here is my results from different GPU servers.
-
If I directly run your code, all server get this:
Valid PD Matrix: True
Not PD Matrix: True
Matrix with NaN: Raised exception: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).
Almost Singular Matrix: True
Negative Diagonal: True
GPU Behavior: True -
I would like to check all behavior in GPU. So I add , device=‘cuda’ in all tensors.
The results are:
RTX8000:
Valid PD Matrix: True
Not PD Matrix: True
Matrix with NaN: False
Almost Singular Matrix: True
Negative Diagonal: True
GPU Behavior: True
A6000:
Valid PD Matrix: True
Not PD Matrix: Failed to raise error
Matrix with NaN: False
Almost Singular Matrix: True
Negative Diagonal: Failed to raise error
GPU Behavior: True
Both failed to raise error for Matrix with NaN.
A6000 failed to raise error for Not PD Matrix and Negative Diagonal.
Is it a bug? Thanks!
Hey @andyhahaha
I re-ran it with cuda on everything but i still get the same behaviour:
Valid PD Matrix: True
Not PD Matrix: True
Matrix with NaN: Raised exception: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).
Almost Singular Matrix: True
Negative Diagonal: True
GPU Behavior: True
So I am guessing this is dependent on GPU then instead of a bug? I mean, it should be the same irrespective of GPUs so may be a bug? Not sure haha
I ran this on colab – Tesla T4