I started to play with the autograd function in PyTorch, and wrote the following simple example:

import numpy as np
import torch
dat = np.array([[1. + 2.j, 3. + 4.j], [5. + 6.j, 7. + 8.j]], dtype=np.complex)
x = torch.tensor(dat, requires_grad=True)
y = 2.0 * x**2
extern_grad = torch.tensor(np.ones_like(dat))
y.backward(gradient=extern_grad)
print(4 * dat)
print(x.grad)

The print results show that the pre-computed gradient (4 * dat) is the conjugate of the autograd (x.grad). However, I expect that they should be identical. Does the way how I used autograd contain some errors?

In PyTorch 1.6.0, I get identical results for both gradient calculations. In latest PyTorch version (1.10.0) though, I get what you got. It is because autograd calculates Wirtinger derivatives.

[Edit: The link that y cahit posted looks like a good explanation.]

I believe that this behavior is by design.

I’m foggy on the details, but I think that this choice is driven by how
you want gradient descent to work with gradients of (real-valued)
losses with respect to complex parameters.

The best (but imperfect) discussion I know of is this github issue:

Perhaps @albanD has some updated information or perhaps a link
to more expository documentation.

That discussion is indeed pretty accurate.
And indeed the main motivation is for the optimizer to be able to be re-used with p = p - lr * grad where lr is real will move you in the right direction.