Hi bchen!
The answer is rather nuanced and the best (pytorch) explanation is here
in the Autograd-for-Complex-Numbers documentation.
There are two things going on here:
First, when you call .backward()
, it uses, by default gradient = None
.
The pytorch framework is designed to optimize real-valued loss functions.
Because of this, when you call some_complex_loss.backward()
pytorch
computes (in effect) some_complex_loss.real.backward()
. (I’m pretty
sure of this, but you might want to scour the above documentation to see
if I’ve missed some nuance.)
Note, in particular, that .backward()
computes, in the language of
Wirtinger derivatives, d / dz*
(not d / dz
).
Second, you are computing the gradient with respect to a real tensor.
(By default, Linear
is instantiated with real weights.) This causes
backward()
not to store (or maybe not even compute) the full complex
gradient – presumably because tensors and their .grad
properties are
required to be of the same type.
If you need the gradient of the imaginary part, yes, call .backward()
on .imag
. (Be sure to take into account that .imag
is real so that
t_complex = t_complex.real + 1j * t_complex.imag
.)
Here is an illustration of these points:
>>> import torch
>>> print (torch.__version__)
1.13.0
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> # create a complex Linear that happens to have real coefficients -- experimental
>>> linear_layer = torch.nn.Linear (2, 2).to (dtype = torch.complex64)
<path_to_pytorch_install>\torch\nn\modules\module.py:975: UserWarning: Complex modules are a new feature under active development whose design may change, and some modules might not work as expected when using complex tensors as parameters or buffers. Please file an issue at https://github.com/pytorch/pytorch/issues/new?template=bug-report.yml if a complex module does not work as expected.
warnings.warn(
>>>
>>> # input to Linear must also be complex
>>> test = linear_layer (torch.tensor ([[1. + 1.j, 2. + 2.j]]))
>>>
>>> (test * (2 + 3j)).sum().backward (retain_graph = True) # gradient = None (by default)
>>> linear_layer.weight.grad # now you get the full complex grad (of real part of "loss")
tensor([[-1.-5.j, -2.-10.j],
[-1.-5.j, -2.-10.j]])
>>> linear_layer.bias.grad # now you get the full complex grad (of real part of "loss")
tensor([2.-3.j, 2.-3.j])
>>>
>>> linear_layer.zero_grad()
>>> (test * (2 + 3j)).sum().real.backward (retain_graph = True) # same as without .real
>>> linear_layer.weight.grad
tensor([[-1.-5.j, -2.-10.j],
[-1.-5.j, -2.-10.j]])
>>> linear_layer.bias.grad
tensor([2.-3.j, 2.-3.j])
>>>
>>> linear_layer.zero_grad()
>>> (test * (2 + 3j)).sum().imag.backward (retain_graph = True) # .imag is (of course) different
>>> linear_layer.weight.grad
tensor([[ 5.-1.j, 10.-2.j],
[ 5.-1.j, 10.-2.j]])
>>> linear_layer.bias.grad
tensor([3.+2.j, 3.+2.j])
Note the sign of the imaginary part of linear_layer.bias.grad
. This is
due to the use of d / dz*
as the gradient.
I haven’t used functorch
, but I believe that it uses autograd under the
hood, with an overlay of “more efficient” loops. So I expect that these
principles still apply and the autograd documentation still tells you the
(piece-wise) details of what is happening.
Best.
K. Frank