Hessian matrix of logistic regression is not PSD

lallazhao · January 4, 2022, 4:15am

I implement a logistic regression model with random initialization and random input. I calculate the Hessian matrix using torch.autograd.grad. Here is the code:

import torch
from torch import nn

class LogisticRegression(nn.Module):
    def __init__(self, input_dim):
        super(LogisticRegression, self).__init__()
        self.fc = nn.Linear(input_dim, 1, bias=False)
        self.sigmoid = nn.Sigmoid()
        with torch.no_grad():
            # random initialization
            self.fc.weight = nn.Parameter(torch.rand((1, input_dim)).sub(0.25))

    def forward(self, x):
        x = self.sigmoid(self.fc(x))
        return x.squeeze()

input_ = torch.rand((100, 100))
target = torch.rand(100)

model = LogisticRegression(input_dim=100)
output = model(input_)

criterion = nn.BCELoss(reduction="none")
loss = torch.sum(criterion(output, target))

""" Computing the hessian matrix """

hess = torch.zeros((100, 100))

grad = torch.autograd.grad(outputs=loss, inputs=model.fc.weight, create_graph=True)
grad = grad[0].squeeze()

for i, g in enumerate(grad):
    model.zero_grad()
    second_order_grad = torch.autograd.grad(outputs=g, inputs=model.fc.weight, retain_graph=True)
    second_order_grad = second_order_grad[0].squeeze()
    hess[i, :] = second_order_grad

""" Check if hessian is positive semi-definite """

eigenvalues, _ = torch.linalg.eig(hess)
print(eigenvalues)

By running the code, I got the following results

tensor([ 1.2445e-03+0.0000e+00j, -4.1439e-04+0.0000e+00j,
         1.8660e-04+0.0000e+00j,  1.1943e-04+0.0000e+00j,
        -1.1723e-04+0.0000e+00j, -9.6112e-05+0.0000e+00j,
         8.6769e-05+0.0000e+00j, -6.3600e-05+0.0000e+00j,
        -6.2202e-05+0.0000e+00j,  7.0490e-05+0.0000e+00j,
        -4.6572e-05+0.0000e+00j,  6.4383e-05+0.0000e+00j,
         5.7739e-05+0.0000e+00j,  5.6093e-05+0.0000e+00j,
         4.4553e-05+0.0000e+00j, -4.2349e-05+0.0000e+00j,
        -3.3346e-05+0.0000e+00j,  4.6737e-05+0.0000e+00j,
         4.0225e-05+0.0000e+00j,  3.7023e-05+0.0000e+00j,
         3.2834e-05+0.0000e+00j, -2.8623e-05+0.0000e+00j,
        -2.3274e-05+0.0000e+00j,  3.1192e-05+0.0000e+00j,
         3.0093e-05+0.0000e+00j,  2.7552e-05+0.0000e+00j,
        -1.9698e-05+0.0000e+00j,  2.4640e-05+0.0000e+00j,
         2.3299e-05+0.0000e+00j,  2.0421e-05+0.0000e+00j,
        -1.5103e-05+0.0000e+00j, -1.3587e-05+0.0000e+00j,
         1.8011e-05+0.0000e+00j,  1.6778e-05+0.0000e+00j,
         1.5867e-05+0.0000e+00j,  1.4337e-05+0.0000e+00j,
        -1.1166e-05+0.0000e+00j, -1.0755e-05+0.0000e+00j,
        -9.8075e-06+0.0000e+00j,  1.2454e-05+0.0000e+00j,
         1.3076e-05+6.5081e-08j,  1.3076e-05-6.5081e-08j,
        -7.7048e-06+0.0000e+00j,  1.0609e-05+0.0000e+00j,
         1.1109e-05+0.0000e+00j, -7.4331e-06+0.0000e+00j,
        -6.7556e-06+0.0000e+00j,  9.6333e-06+0.0000e+00j,
         8.8359e-06+0.0000e+00j,  7.9485e-06+0.0000e+00j,
        -5.9216e-06+0.0000e+00j, -4.6393e-06+0.0000e+00j,
        -4.6548e-06+0.0000e+00j,  7.1046e-06+0.0000e+00j,
         6.9616e-06+0.0000e+00j,  5.9614e-06+0.0000e+00j,
         5.6388e-06+0.0000e+00j,  5.1294e-06+0.0000e+00j,
        -4.1212e-06+0.0000e+00j,  4.6867e-06+0.0000e+00j,
         3.9201e-06+0.0000e+00j,  3.5539e-06+6.5330e-08j,
         3.5539e-06-6.5330e-08j, -3.1284e-06+0.0000e+00j,
        -3.0800e-06+0.0000e+00j, -2.3478e-06+0.0000e+00j,
        -2.1333e-06+0.0000e+00j,  2.6568e-06+0.0000e+00j,
         2.4638e-06+0.0000e+00j,  2.1477e-06+0.0000e+00j,
        -1.3528e-06+0.0000e+00j, -1.1852e-06+0.0000e+00j,
         1.7787e-06+0.0000e+00j,  1.6337e-06+0.0000e+00j,
         1.4714e-06+0.0000e+00j, -9.1909e-07+0.0000e+00j,
        -8.1863e-07+0.0000e+00j,  1.1196e-06+0.0000e+00j,
         1.0123e-06+0.0000e+00j,  9.0910e-07+0.0000e+00j,
        -5.4653e-07+0.0000e+00j,  7.3642e-07+0.0000e+00j,
         5.7726e-07+0.0000e+00j, -4.1951e-07+0.0000e+00j,
         3.9256e-07+0.0000e+00j, -1.7926e-07+0.0000e+00j,
         3.3471e-07+0.0000e+00j,  2.6827e-07+0.0000e+00j,
         2.1603e-07+0.0000e+00j, -1.0832e-07+0.0000e+00j,
         1.1932e-07+1.2010e-08j,  1.1932e-07-1.2010e-08j,
        -6.6019e-08+0.0000e+00j,  7.9518e-08+0.0000e+00j,
         6.2423e-08+0.0000e+00j,  2.2673e-08+0.0000e+00j,
         6.9920e-09+0.0000e+00j,  1.1066e-09+1.8414e-09j,
         1.1066e-09-1.8414e-09j,  4.6437e-12+0.0000e+00j])

Apparently, some negative value exists in the eigenvalues. However, the Hessian matrix of logistic regression is supposed to be positive semi-definite, where all eigenvalues should be larger or equal to zero with an arbitrary value of weight and input.

May I know what causes this issue? Thank you!

KFrank · January 5, 2022, 5:04pm

Hi Lalla!

This is a numerical issue. (Your code for calculating the Hessian looks
right.)

What’s at issue is that you are working with a 100 x 100 matrix (where
the 100 in question is the input_dim of your Linear, not the number of
samples in your batch).

As a matrix gets larger, it is common for it to become more ill-conditioned.
The so-called condition number of a matrix tells you, roughly, how much
any error (including numerical round-off error) might get “amplified” when
doing things like inverting the matrix or calculating its eigenvalues. One
common definition of the condition number is the ratio of the magnitudes
of the largest and smallest (in magnitude) eigenvalues of the matrix.

You can use torch.linalg.cond() to calculate the condition number of your
Hessian matrix.

You can do two things to explore this:

First, look at different size matrices. Repeat the calculation with increasing
values for input_dim, say, 5, 10, 20, 50, 100, and see how the condition
number grows as input_dim grows. (You may keep using 100 for the
batch size.) Single-precision floating-point arithmetic is accurate to
about 7 decimal digits. When your condition number grows to the order
of about 10^7, to can’t expect the result for your eigenvalues – especially
for the smaller eigenvalues – to remain accurate, and what should have
been a positive eigenvalue can become, numerically, negative.

You can also repeat the calculation in double precision (using the same
input data converted to double precision). Treat the double precision as
the “correct” answer. (The double-precision calculation will have the
same numerical issues, but they won’t set in as soon.) As long as the
single-precision and double-precision results agree reasonably well, they
should be reasonably accurate. You should find that your eigenvalues
don’t become negative until the single-precision and double-precision
results start to deviate.

As a final comment, there are two ways numerically negative eigenvalues
could show up. First, the Hessian matrix you calculate has numerical error
in it. It could be that the Hessian matrix – with its numerical error – has
a negative eigenvalue. Second, it could be that the Hessian matrix – even
with its numerical error – doesn’t have a negative eigenvalue, but that
numerical error in the eigenvalue computation produces a negative value.
I’m not sure which is more likely in this specific case. Related to this, note
that your Hessian matrix, as calculated, will probably not be exactly
symmetric (due to round-off error).

Best.

K. Frank

lallazhao · January 7, 2022, 5:48am

Hi Frank,

Thanks for your quick and helpful response! As I gradually decrease the input dimension, the hessian matrix turns into a PSD. Your insight is perfect!

I want to confirm that, does turning the model into double is the only way to increase the numerical stability? Thank you.

KFrank · January 7, 2022, 4:20pm

Hi Lalla!

To first approximation, yes, using double precision is your best bet for
reducing this issue.

Please note, double precision won’t eliminate that problem – it will just
postpone it some, and perhaps not by very much.

One thing you could try that might help marginally: Symmetrize your
Hessian matrix. Due to round-off error, your version of hess is not
exactly symmetric.

Setting hess = (hess + hess.T) / 2 will force hess to be symmetric
and might make the elements of hess a little more accurate. This won’t
solve your problem, but it might reduce is a little bit.

Also, what are you using hess for? If it’s critical for hess to be
positive-definite, but it’s okay for hess (and, in particular, its smallest
eigenvalues) to not be fully correct, you could add a small epsilon
to it’s diagonal.

Another approach – depending upon how you use hess – would be
to only work in the subspace of hess for which the eigenvalues are
comfortably positive. You can do this by setting any eigenvalues values
that lie below a certain threshold to zero, or, similarly, perform a
singular-value decomposition, and set any singular values below a
certain threshold to zero.

Best.

K. Frank

lallazhao · January 8, 2022, 9:01am

Got it, thanks for your kind help!