Hi bchen!

The answer is rather nuanced and the best (pytorch) explanation is here

in the Autograd-for-Complex-Numbers documentation.

There are two things going on here:

First, when you call `.backward()`

, it uses, by default `gradient = None`

.

The pytorch framework is designed to optimize real-valued loss functions.

Because of this, when you call `some_complex_loss.backward()`

pytorch

computes (in effect) `some_complex_loss.real.backward()`

. (I’m pretty

sure of this, but you might want to scour the above documentation to see

if I’ve missed some nuance.)

Note, in particular, that `.backward()`

computes, in the language of

Wirtinger derivatives, `d / dz*`

(not `d / dz`

).

Second, you are computing the gradient with respect to a real tensor.

(By default, `Linear`

is instantiated with real weights.) This causes

`backward()`

not to store (or maybe not even compute) the full complex

gradient – presumably because tensors and their `.grad`

properties are

required to be of the same type.

If you need the gradient of the imaginary part, yes, call `.backward()`

on `.imag`

. (Be sure to take into account that `.imag`

is *real* so that

`t_complex = t_complex.real + 1j * t_complex.imag`

.)

Here is an illustration of these points:

```
>>> import torch
>>> print (torch.__version__)
1.13.0
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> # create a complex Linear that happens to have real coefficients -- experimental
>>> linear_layer = torch.nn.Linear (2, 2).to (dtype = torch.complex64)
<path_to_pytorch_install>\torch\nn\modules\module.py:975: UserWarning: Complex modules are a new feature under active development whose design may change, and some modules might not work as expected when using complex tensors as parameters or buffers. Please file an issue at https://github.com/pytorch/pytorch/issues/new?template=bug-report.yml if a complex module does not work as expected.
warnings.warn(
>>>
>>> # input to Linear must also be complex
>>> test = linear_layer (torch.tensor ([[1. + 1.j, 2. + 2.j]]))
>>>
>>> (test * (2 + 3j)).sum().backward (retain_graph = True) # gradient = None (by default)
>>> linear_layer.weight.grad # now you get the full complex grad (of real part of "loss")
tensor([[-1.-5.j, -2.-10.j],
[-1.-5.j, -2.-10.j]])
>>> linear_layer.bias.grad # now you get the full complex grad (of real part of "loss")
tensor([2.-3.j, 2.-3.j])
>>>
>>> linear_layer.zero_grad()
>>> (test * (2 + 3j)).sum().real.backward (retain_graph = True) # same as without .real
>>> linear_layer.weight.grad
tensor([[-1.-5.j, -2.-10.j],
[-1.-5.j, -2.-10.j]])
>>> linear_layer.bias.grad
tensor([2.-3.j, 2.-3.j])
>>>
>>> linear_layer.zero_grad()
>>> (test * (2 + 3j)).sum().imag.backward (retain_graph = True) # .imag is (of course) different
>>> linear_layer.weight.grad
tensor([[ 5.-1.j, 10.-2.j],
[ 5.-1.j, 10.-2.j]])
>>> linear_layer.bias.grad
tensor([3.+2.j, 3.+2.j])
```

Note the sign of the imaginary part of `linear_layer.bias.grad`

. This is

due to the use of `d / dz*`

as the gradient.

I haven’t used `functorch`

, but I believe that it uses autograd under the

hood, with an overlay of “more efficient” loops. So I expect that these

principles still apply and the autograd documentation still tells you the

(piece-wise) details of what is happening.

Best.

K. Frank