Autograd on complex numbers

ztan · February 21, 2022, 5:23pm

Dear PyTorch Developers,

I started to play with the autograd function in PyTorch, and wrote the following simple example:

import numpy as np
import torch

dat = np.array([[1. + 2.j, 3. + 4.j], [5. + 6.j, 7. + 8.j]], dtype=np.complex)
x = torch.tensor(dat, requires_grad=True)

y = 2.0 * x**2

extern_grad = torch.tensor(np.ones_like(dat))
y.backward(gradient=extern_grad)

print(4 * dat)
print(x.grad)

The print results show that the pre-computed gradient (4 * dat) is the conjugate of the autograd (x.grad). However, I expect that they should be identical. Does the way how I used autograd contain some errors?

With best regards,

cahity · February 21, 2022, 6:35pm

In PyTorch 1.6.0, I get identical results for both gradient calculations. In latest PyTorch version (1.10.0) though, I get what you got. It is because autograd calculates Wirtinger derivatives.

More info at: https://pytorch.org/docs/stable/notes/autograd.html#autograd-for-complex-numbers

KFrank · February 21, 2022, 6:44pm

Hi Zhengguo!

[Edit: The link that y cahit posted looks like a good explanation.]

I believe that this behavior is by design.

I’m foggy on the details, but I think that this choice is driven by how
you want gradient descent to work with gradients of (real-valued)
losses with respect to complex parameters.

The best (but imperfect) discussion I know of is this github issue:

github.com/pytorch/pytorch

Follow the JAX or Tensorflow convention for meaning of grad() with complex inputs

opened 06:29PM - 22 Jul 20 UTC

closed 08:24PM - 15 Jan 21 UTC

ezyang

high priority module: autograd triaged module: complex

cc #755 JAX and Tensorflow disagree about whether or not the grad of a compl…ex tensor should be conjugated or not. Here is an easy way to see the difference: ``` from jax import grad def f(z): return z * z z = 0.1j print(grad(f, holomorphic=True)(z)) ``` gives 0.2j However ``` >>> x = tf.Variable(0. + 0.j) >>> sess.run(tf.gradients(x*x, x), feed_dict={x:0.1j}) [-0.20000000000000001j] ``` source: https://github.com/tensorflow/tensorflow/issues/3348 PyTorch also has to decide which side of the field it will come down on. Right now on master it implements JAX. From reading the issue, here is my understanding of the pros and cons: In favor of TF: * The gradient is the correct direction for doing gradient descent. This means you can use a "stock" optimizer (one that was written with only real parameters in mind) without any changes to the optimizer. In contrast, to do gradient descent with the JAX definition, you have to remember to conjugate first. In favor of JAX: * When the TF definition is implemented directly in the gradient formulas (as opposed to doing a single post facto conjugation), you end up with ugly gradient formulas. For example, take a look at https://github.com/tensorflow/tensorflow/blob/70fd0a4436e3b49139653dc5b85d1c7df23f403d/tensorflow/python/ops/math_grad.py#L453 where TF has to explicitly conjugate the input. With the JAX style definition, you can mostly reuse your real gradient formulas. * The TF definition is less efficient, since you're doing extra conjugations in the gradient formulas. Even if you write the derivative formulas in the JAX way and then conjugate before plopping the gradient in x.grad, you lose the opportunity to do a fused conjugate-add for the optimizer update. Gradient formulas look ugly seems like a clear reason to prefer JAX style. Posting this for other opinions. cc @ezyang @gchanan @zou3519 @SsnL @albanD @gqchen @anjali411 @dylanbespalko @vincentqb

Perhaps @albanD has some updated information or perhaps a link
to more expository documentation.

Best.

K. Frank

albanD · February 24, 2022, 4:59pm

That discussion is indeed pretty accurate.
And indeed the main motivation is for the optimizer to be able to be re-used with p = p - lr * grad where lr is real will move you in the right direction.