Hi, in my loss function I am calculating the eigen decomposition of the predicted and true hermitian matrices using torch.linalg.eigh
. Because of the nature of the matrices, all of their eigenvalues are positive or zero and I need to discard the eigenvectors that have an eigenvalue of zero. Is there any way to discard them without making the loss function non differentiable? Thanks!
Hi Wasabi!
As an aside, without knowing how you use the eigenvectors in your loss function, it’s
hard to know whether such a loss function would become non-differentiable.
More to the point, many pytorch operations are usefully differentiable (in the sense
of working in practice for gradient-descent optimization) even though they are not
technically differentiable. Specifically they are differentiable “almost everywhere” and
fail to be differentiable “on a set of measure zero.” abs (x)
is a concrete example: it
is differentiable for all x
except x = 0
and is routinely used in models that are trained
with backpropagation.
In your case, as you vary the elements of your hermitian matrices, your eigenvalues
will generically change from non-zero to zero at isolates points (or, more correctly, on
a set of measure zero within the space of the elements of the hermitian matrices). So
this “isolated” non-differentiabilty won’t be an issue.
Note, however (independent of the issue of discarding eigenvectors), if your loss function
depends on the eigenvectors (as compared to just the eigenvalues), the phases of the
eigenvectors are arbitrary, so the loss has to be independent of those arbitrary phases.
Also, if you have degenerate eigenvalues (that is, multiple eigenvalues of the same
value), eigenvectors that have the same eigenvalue lie in a well-defined “eigensubspace,”
but the specific eigenvectors (within that subspace) chosen to span that subspace are
arbitrary. Therefore your loss can depend on the eigensubspace, but not on the specific
eigenvectors chosen to span the subspace.
Differentiability and discarding eigenvectors won’t be your problem, but, more generally,
a loss function that depends on eigenvectors can often lead to problems.
Best.
K. Frank
Thanks!
Because of the nature of the matrix I am using, the only degenerete eigenvalues are the eigenvalues with an eigenvalue of 0 and I take the square of the eigenvectors to remove the effect of ambiguous phases of eigenvectors.
So if i create a mask like mask = eigenvalues < 10**-7
and that to get eigenvectors with non-zero eigenvalues like eigenvectors[mask]
would it be alright as it is pseudo-differentiable?
Hi Wasabi!
As an aside, if you literally take the square of your eigenvectors, you won’t be removing
the arbitrary phases – you would be doubling them. If you multiply your eigenvectors by
their complex conjugates, you would be cancelling out the arbitrary phases.
Yes, in the context you describe of having no non-zero degenerate eigenvalues, you should
be fine. (I suppose one could construct a perverse loss function that could mess things up,
but I wouldn’t expect that to happen in a realistic case.) When an eigenvalue crosses over
your mask threshold, you could have non-differentiability (and discontinuity), but lots of
models have that sort of behavior and can be trained just fine.
Best.
K. Frank