Cholesky nan vs not PD

Albert_T · August 25, 2023, 3:42am

I sometimes get nans in matrices when calling torch.linalg.cholesky. Other times, I get an error message that the matrix is not PD. Under what conditions does cholesky return nan vs throwing a PD error?

andyhahaha · September 5, 2024, 7:10am

I have the same issue.
When I run torch.linalg.cholesky with the same hessian on different GPU server, the behavior is different.
For a6000 GPU, it just return matrix with NaN.
For RTX8000 GPU, it show error message: "LinAlgError: linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).
"

Is it the bug that torch.linalg.cholesky sometimes doesn’t catch non PD matrix and just generate NaN matrix?

Thanks,
Po-Yu Huang

Soumya_Kundu · September 5, 2024, 2:39pm

Based on this small testing below:

import torch

def test_cholesky_decomposition():
    results = []

    # Test 1: Valid positive definite matrix
    A = torch.tensor([[2.0, 1.0], [1.0, 2.0]])
    try:
        L = torch.linalg.cholesky(A)
        results.append(("Valid PD Matrix", torch.allclose(torch.mm(L, L.t()), A)))
    except Exception as e:
        results.append(("Valid PD Matrix", f"Failed: {str(e)}"))

    # Test 2: Not positive definite matrix
    A = torch.tensor([[1.0, 2.0], [2.0, 1.0]])
    try:
        torch.linalg.cholesky(A)
        results.append(("Not PD Matrix", "Failed to raise error"))
    except Exception as e:
        results.append(("Not PD Matrix", "not positive-definite" in str(e)))

    # Test 3: Matrix with NaN
    A = torch.tensor([[1.0, float('nan')], [float('nan'), 1.0]])
    try:
        L = torch.linalg.cholesky(A)
        results.append(("Matrix with NaN", torch.isnan(L).all()))
    except Exception as e:
        results.append(("Matrix with NaN", f"Raised exception: {str(e)}"))

    # Test 4: Almost singular matrix
    A = torch.tensor([[1e-8, 0], [0, 1e-8]])
    try:
        L = torch.linalg.cholesky(A)
        results.append(("Almost Singular Matrix", not torch.isnan(L).any()))
    except Exception as e:
        results.append(("Almost Singular Matrix", f"Raised exception: {str(e)}"))

    # Test 5: Negative diagonal
    A = torch.tensor([[1.0, 0], [0, -1.0]])
    try:
        torch.linalg.cholesky(A)
        results.append(("Negative Diagonal", "Failed to raise error"))
    except Exception as e:
        results.append(("Negative Diagonal", "not positive-definite" in str(e)))

    # Test 6: GPU behavior (if available)
    if torch.cuda.is_available():
        A = torch.tensor([[1.0, 0.5], [0.5, 1.0]], device='cuda')
        try:
            L = torch.linalg.cholesky(A)
            results.append(("GPU Behavior", torch.allclose(torch.mm(L, L.t()), A)))
        except Exception as e:
            results.append(("GPU Behavior", f"Failed: {str(e)}"))
    else:
        results.append(("GPU Behavior", "CUDA not available"))

    # Print results
    for test_name, result in results:
        print(f"{test_name}: {result}")

# Run the tests
test_cholesky_decomposition()

My understanding is that the nan matrix is just another Non-PD matrix to it.

andyhahaha · September 6, 2024, 2:50am

Hi @Soumya_Kundu,

Thanks for your great testing code.
Here is my results from different GPU servers.

If I directly run your code, all server get this:
Valid PD Matrix: True
Not PD Matrix: True
Matrix with NaN: Raised exception: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).
Almost Singular Matrix: True
Negative Diagonal: True
GPU Behavior: True
I would like to check all behavior in GPU. So I add , device=‘cuda’ in all tensors.

The results are:

RTX8000:
Valid PD Matrix: True
Not PD Matrix: True
Matrix with NaN: False
Almost Singular Matrix: True
Negative Diagonal: True
GPU Behavior: True
A6000:
Valid PD Matrix: True
Not PD Matrix: Failed to raise error
Matrix with NaN: False
Almost Singular Matrix: True
Negative Diagonal: Failed to raise error
GPU Behavior: True

Both failed to raise error for Matrix with NaN.
A6000 failed to raise error for Not PD Matrix and Negative Diagonal.
Is it a bug? Thanks!

Soumya_Kundu · September 8, 2024, 10:44am

Hey @andyhahaha

I re-ran it with cuda on everything but i still get the same behaviour:

Valid PD Matrix: True
Not PD Matrix: True
Matrix with NaN: Raised exception: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).
Almost Singular Matrix: True
Negative Diagonal: True
GPU Behavior: True

So I am guessing this is dependent on GPU then instead of a bug? I mean, it should be the same irrespective of GPUs so may be a bug? Not sure haha

I ran this on colab – Tesla T4