Linalg.eig breaks for torch >= 1.11

I have this line in the forward pass of my code:

Lambda, Wtilde = torch.linalg.eig(Ktilde)

where Ktilde is a real square matrix, and Lambda and Wtilde are complex.

This used to work absolutely fine up to torch <= 1.10.

However, torch >= 1.11 introduced the following check:

As a result, I now get:

RuntimeError: linalg_eig_backward: The eigenvectors in the complex case are specified up to multiplication by e^{i phi}. The specified loss function depends on this quantity, so it is ill-defined.

which crashes my code.

I don’t want to keep using torch <= 1.10. Is there a way to fix this?

You could try defining a custom torch.autograd.Function perhaps and define the backward method manually? That would allow you get around this issue, although, you’ll have to derive the derivative yourself which can get quite messy depending on the function.

I was more looking for a way to skip the check that autograd does for e^i*phi

You’d have to check with a dev but it’s most likely been added in for numerical stability as there’s a degeneracy on the phase of your eigenvectors, so your loss function isn’t well-defined. Perhaps might be best to get a dev to answer (e.g. @ptrblck, apologizes for the tag!)

I do think you could get around this issue by defining your own torch.autograd.Function object but you’d have to derive the backward formula yourself which you could possibly find via the matrix cookbook or something similar.

Thanks for pinging @AlphaBetaGamma96!
I think you are right and it seems the check was added in this PR explaining:

It also corrects the forward AD formula for it to be correct. Now all
the tests pass for linalg.eig and linalg.eigvals.

with a follow-up here.

Seems to me that the check in this line is checking whether the imaginary part of the eigenvectors is 0 (up to a specified tolerance). If it’s not, gradient computation will not occur.

Which brings me back to my original question – is there a way around this? I can try defining my own derivative as @AlphaBetaGamma96 suggested, but that can get quite messy. Is there a way to make Pytorch compute uniquely-phase-defined eigenvectors so that autograd can proceed? For example, any multiple of an eigenvector is also an eigenvector, but Pytorch gets around this magnitude non-uniqueness by normalizing the eigenvectors to have norm 1. Is there such an option for phase as well?

I am hoping there can be a solution, because otherwise it seems to me that backward passes for torch.linalg.eig in torch >= 1.11 will frequently make the whole program crash.

Hi Sourya!

This is not correct. Backpropagation / gradient computation can, in
appropriate cases,
occur through torch.linalg.eig() where the
eigenvectors / eigenvalues have non-zero imaginary parts. (The code
you linked to that checks for an imaginary part tests an intermediate
result, not the eigenvectors of your Ktilde.)

The short answer is don’t try to compute gradients of quantities that depend
on the phases of your eigenvectors. (Pytorch does support computing
gradients of phase-independent quantities.)

There is not a good way around this. At issue is that the warning in the
documentation and the RuntimeError you cited in your original post
are fully legitimate. If you get that RuntimeError you are almost certainly
doing something that doesn’t make sense.

The problem – as mentioned in the documentation warning – is that the
phases of the eigenvectors are not mathematically uniquely defined. So
a loss function that depends on those phases, as well as the gradient of
such a loss function, is also not uniquely defined.

Pytorch could, hypothetically, impose an ad hoc set of rules that uniquely
determines those phases, but that would just give you ad hoc uniqueness
that would still be mathematically arbitrary, so you’d be sweeping your
issue under the rug, rather than actually fixing it. (Also, if you look at my
example, below, you can see that setting up those ad hoc rules could be
tricky, because there are multiple, mathematically-equivalent paths to
computing the same eigenvectors.)

The following example illustrates what is going on. As a device to generate
eigenvectors with differing phases, I apply an orthogonal transformation
to the original real square matrix, compute its eigenvectors, and then
rotate those eigenvectors back to the basis of the original matrix using
the same orthogonal transformation. You can understand this as a second,
fully mathematically legitimate algorithm for computing the eigenvectors.

To make clear that pytorch version 1.12 does permit you to backpropagate
through eig() provided you are computing the gradient of a
phase-independent quantity, I post the example as run in both versions
1.10 and 1.12. (These two example runs are almost identical duplicates
of one another, so there’s no need to compare them line by line. The
only difference is at the end where the 1.12 version flags the illegitimate
backpropagation.)

Here is the 1.10 version:

>>> import torch
>>> print (torch.__version__)
1.10.2
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> tA = torch.randn (3, 3)
>>> tB = tA.clone()
>>> tA.requires_grad = True
>>> tB.requires_grad = True
>>>
>>> valA, vecA = torch.linalg.eig (tA)
>>>
>>> orth = torch.linalg.svd (torch.randn (3, 3))[0]   # orthogonal matrix to rotate tB
>>> tBr = orth @ tB @ orth.T
>>> valBr, vecBr = torch.linalg.eig (tBr)   # eigenvectors of rotated tB
>>>
>>> tAc = tA.to (dtype = torch.complex64)
>>> orthc = orth.to (dtype = torch.complex64)
>>>
>>> vecB = orthc.T @ vecBr   # rotate eigenvectors back to basis of tB
>>> torch.allclose (tAc @ vecB, valA * vecB)   # check that vecB is indeed a set of eigenvectors of tA
True
>>>
>>> vecB / vecA   # each eigenvector is changed by a complex phase
tensor([[ 0.9991+0.0414j,  0.9991-0.0414j, -1.0000-0.0000j],
        [ 0.9991+0.0414j,  0.9991-0.0414j, -1.0000-0.0000j],
        [ 0.9991+0.0414j,  0.9991-0.0414j, -1.0000+0.0000j]],
       grad_fn=<DivBackward0>)
>>>
>>> lossGoodA = (vecA * vecA.conj())[0, 0]
>>> lossGoodB = (vecB * vecB.conj())[0, 0]
>>>
>>> lossGoodA
tensor(0.0332+0.j, grad_fn=<SelectBackward0>)
>>> lossGoodB
tensor(0.0332+0.j, grad_fn=<SelectBackward0>)
>>> torch.allclose (lossGoodA, lossGoodB)   # phase-independent loss is the same for both sets of eigenvectors
True
>>>
>>> lossGoodA.backward()   # both versions 1.10 and 1.12 permit legitimate backward pass
>>> tA.grad
tensor([[-0.0096,  0.4561, -0.3340],
        [ 0.0084, -0.0616,  0.0662],
        [-0.0009, -0.1106,  0.0712]])
>>>
>>> lossBadA = (vecA * vecA).real[0, 0]
>>> lossBadB = (vecB * vecB).real[0, 0]
>>>
>>> lossBadA
tensor(0.0198, grad_fn=<SelectBackward0>)
>>> lossBadB
tensor(0.0176, grad_fn=<SelectBackward0>)
>>> torch.allclose (lossBadA, lossBadB)   # this loss is not phase-independent
False
>>>
>>> lossBadB.backward()   # version 1.12 flags this backward pass as ill-defined
>>> tB.grad   # this gradient is phase-dependent (in version 1.10 and not computed in version 1.12)
tensor([[ 0.0844,  0.2666,  0.0732],
        [-0.0095, -0.0822,  0.0267],
        [-0.0213, -0.0918, -0.0022]])

And here is the 1.12 version:

>>> import torch
>>> print (torch.__version__)
1.12.0
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> tA = torch.randn (3, 3)
>>> tB = tA.clone()
>>> tA.requires_grad = True
>>> tB.requires_grad = True
>>>
>>> valA, vecA = torch.linalg.eig (tA)
>>>
>>> orth = torch.linalg.svd (torch.randn (3, 3))[0]   # orthogonal matrix to rotate tB
>>> tBr = orth @ tB @ orth.T
>>> valBr, vecBr = torch.linalg.eig (tBr)   # eigenvectors of rotated tB
>>>
>>> tAc = tA.to (dtype = torch.complex64)
>>> orthc = orth.to (dtype = torch.complex64)
>>>
>>> vecB = orthc.T @ vecBr   # rotate eigenvectors back to basis of tB
>>> torch.allclose (tAc @ vecB, valA * vecB)   # check that vecB is indeed a set of eigenvectors of tA
True
>>>
>>> vecB / vecA   # each eigenvector is changed by a complex phase
tensor([[ 0.9991+0.0414j,  0.9991-0.0414j, -1.0000-0.0000j],
        [ 0.9991+0.0414j,  0.9991-0.0414j, -1.0000-0.0000j],
        [ 0.9991+0.0414j,  0.9991-0.0414j, -1.0000+0.0000j]],
       grad_fn=<DivBackward0>)
>>>
>>> lossGoodA = (vecA * vecA.conj())[0, 0]
>>> lossGoodB = (vecB * vecB.conj())[0, 0]
>>>
>>> lossGoodA
tensor(0.0332+0.j, grad_fn=<SelectBackward0>)
>>> lossGoodB
tensor(0.0332+0.j, grad_fn=<SelectBackward0>)
>>> torch.allclose (lossGoodA, lossGoodB)   # phase-independent loss is the same for both sets of eigenvectors
True
>>>
>>> lossGoodA.backward()   # both versions 1.10 and 1.12 permit legitimate backward pass
>>> tA.grad
tensor([[-0.0096,  0.4561, -0.3340],
        [ 0.0084, -0.0616,  0.0662],
        [-0.0009, -0.1106,  0.0712]])
>>>
>>> lossBadA = (vecA * vecA).real[0, 0]
>>> lossBadB = (vecB * vecB).real[0, 0]
>>>
>>> lossBadA
tensor(0.0198, grad_fn=<SelectBackward0>)
>>> lossBadB
tensor(0.0176, grad_fn=<SelectBackward0>)
>>> torch.allclose (lossBadA, lossBadB)   # this loss is not phase-independent
False
>>>
>>> lossBadB.backward()   # version 1.12 flags this backward pass as ill-defined
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<path_to_pytorch_install>\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "<path_to_pytorch_install>\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: linalg_eig_backward: The eigenvectors in the complex case are specified up to multiplication by e^{i phi}. The specified loss function depends on this quantity, so it is ill-defined.
>>> tB.grad   # this gradient is phase-dependent (in version 1.10 and not computed in version 1.12)
>>>

Just to be clear, the RuntimeError raised in version >= 1.11 isn’t breaking
your code – it’s helpfully warning you that your code is already broken.

Best.

K. Frank

3 Likes

Thank you for the great explanation.

Unfortunately, due to numerical imprecisions, the tensors in my code end up having very small imaginary parts (e.g. 2e-08j when using float precision, or 2e-17j when using double precision). I cannot keep these complex values because these values will then pass through torch.nn layers, and so should be real. So I tried to remove the imaginary portions using torch.abs(). I think this is causing the RunTime Error; since when I do torch.abs(x), I am essentially doing torch.sqrt(x*x.conj()).real, and this causes the autograd graph to depend on the phase.

I don’t know what to do about this issue. I need to remove those tiny imaginary parts, but in a way so that the autograd graph is not affected.

Can’t you just do something like x = x.real() ? Or if you need it to remain complex, you could cast it back to complex with the imaginary components like torch.zeros_like(x) so you have a complex part that’s purely real and zero in the imaginary componet?

That shouldn’t affect backprop with these small imaginary components?

Ya I tried that, it doesn’t work unfortunately. x.real throws away the imaginary part, but that operation becomes a part of the computational graph, so the gradient is traced back to the imaginary components.

I also cannot do with torch.no_grad(); x.real since I need the downstream gradients.

Hi Sourya!

Could you explain how these “numerical imprecisions” are related to
torch.linalg.eig()?

On the contrary, torch.abs (x) = torch.sqrt (x * x.conj()).real
is independent of the phase. A simple test shows that backpropagating
an eigenvector (with a non-trivial phase) through abs() works just fine:

>>> import torch
>>> print (torch.__version__)
1.12.0
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> t = torch.randn (3, 3, requires_grad = True)
>>> val, vec = torch.linalg.eig (t)
>>> loss = vec[0, 0].abs()
>>> loss.backward()
>>> t.grad
tensor([[-0.0265,  1.2525, -0.9171],
        [ 0.0231, -0.1691,  0.1819],
        [-0.0026, -0.3037,  0.1956]])

Could you post a truly minimal, complete, runnable script that reproduces
the issue you think you are having with torch.linalg.eig()?

It should be a simple as: create some sample data; call torch.linalg.eig();
compute a phase-independent loss function; and call loss.backward().
It should work.

If you call loss.backward() on a phase-dependent loss function, we do
expect it to fail, in which case we have to figure out what you are trying to
accomplish with your phase-dependent loss function and figure out how
you might actually do it correctly – that is, with a phase-independent loss.

Best.

K. Frank

1 Like

Here’s a folder containing main.py (and its dependencies helper.py and data Xtr.pt) that reproduces the issue.

I could not make it simpler without disrupting functionality, hopefully this is straightforward enough for you to take a look. Just run main.py. The linalg.eig phase error will occur when computing loss.backward() in epoch 3.

FYI my torch version is 1.12.0 and numpy version is 1.23.0.

Your help is much appreciated, thanks a lot in advance!

I have figured out the error, it was actually in a different portion of the code where I was using torch.exp instead of torch.linalg.matrix_exp. Consequently I have also deleted the folder I linked above.

Thanks so much for this conversation, particularly @KFrank . All of your answers helped me to re-examine my code in detail and figure out the bug. Much appreciated!!