Auto gradient computation

I have a fundamental question here. Consider the objective function I have is F[G(b)], where F is function of matrix G, that F[G] = trace(inv(G)). G(b) is a function that use vector b as input. Now I’m using autograd function to compute numerical gradient of F over b, which is \Delta F /
Delta b
. In my configuration, vector b is double-type and objective function F generate result of a complex number, but the image part is extremely small that can be ignore (image part number is usually around 1e-20, which I think is computational error). Under this setting, when I get the numerical gradient of vector b, the gradient is a complex number. I would like to know if this is a possible situation or this is caused by some settings about autograd function.

Kevin

Hi Kevin!

Could you post a short, self-contained, runnable script that illustrates
your issue?

Where in the process does the real b get turned into a complex F?
Is G already complex?

You mention that F is essentially real (very small imaginary part).
Mathematically speaking, should F be purely real?

Best.

K. Frank

The script I’m running is a MATLAB script that calls python function so it is a bit complicated to post it all. But the python function I’m running is here:

def F_Grad(Ch, CH, CW, F, b):
    Ch_Ten = torch.from_numpy(Ch).to(torch.complex128)
    b_Ten = torch.from_numpy(b).to(torch.complex128)
    b_Ten.requires_grad_()
    B_Ten = torch.diagflat(b_Ten).to(torch.complex128)
    F_Ten = torch.from_numpy(F)
    F_H_Ten = torch.transpose(torch.conj(F_Ten), 0, 1)
    A_Ten = torch.matmul(B_Ten, F_Ten)
    A_H_Ten = torch.transpose(torch.conj(A_Ten), 0, 1)
    CH_Ten = torch.from_numpy(CH).to(torch.complex128)
    CW_Ten = torch.from_numpy(CW).to(torch.complex128)
    Ch_Tilde = torch.diagflat(torch.diagonal(torch.matmul(A_H_Ten, torch.matmul(CH_Ten, A_Ten)), 0))
    G_Temp1 = torch.matmul(A_Ten, torch.matmul(Ch_Tilde, A_H_Ten))
    G_Temp2 = torch.inverse(torch.add(G_Temp1, CW_Ten))
    G = torch.matmul(Ch_Tilde, torch.matmul(A_H_Ten, G_Temp2))
    GA = torch.matmul(G, A_Ten)

    F_OBJ1_Temp1 = torch.sub(torch.matmul(F_Ten, GA), F_Ten, alpha=1)
    F_OBJ1_H_Temp1 = torch.transpose(torch.conj(F_OBJ1_Temp1), 0, 1)
    F_OBJ1_Final = torch.trace(torch.matmul(F_OBJ1_Temp1, torch.matmul(Ch_Ten, F_OBJ1_H_Temp1)))

    F_OBJ2_Temp1 = torch.matmul(F_Ten, G)
    F_OBJ2_H_Temp1 = torch.transpose(torch.conj(F_OBJ2_Temp1), 0, 1)
    F_OBJ2_Final = torch.trace(torch.matmul(F_OBJ2_Temp1, torch.matmul(CW_Ten, F_OBJ2_H_Temp1)))
    F_OBJ_tensor = F_OBJ1_Final + F_OBJ2_Final
    F_OBJ_tensor.backward()
    b_grad_tensor = b_Ten.grad
    b_grad = b_grad_tensor.numpy()
    F_OBJ = F_OBJ_tensor.detach().numpy()
    return [F_OBJ, b_grad]

There are five input variables. Ch is a diagonal real matrix, CH is a symmetric complex matrix, CW is a diagonal real matrix, F is DFT matrix, and b is a real vector.

Mathematically objective function should be purely real because the physical meaning of it is the mean square error.

Hi Kevin!

Okay, that was opaque …

Does “DFT” mean “discrete Fourier transform?” Is F also complex?

If the objective function is mathematically purely real, I would try to
reorganize the calculation so that all the intermediate results are purely
real, as well. Doing so would prevent “mathematically zero” imaginary parts
from creeping in through round-off error.

If you can do this, your gradient will remain real.

To do so, you might have to express complex objects such as CH
explicitly in terms of their real and imaginary parts (e.g., CH_Real
and CH_Imag) so that everything would be torch.float64 and
nothing would be torch.complex128.

Best.

K. Frank

Yes, F is discrete fourier transform matrix. Thanks for your suggestion by the way. I would also like to know from the top-level concept, is it true that as long as the objective function result is strictly real, the gradient of the function over a real vector should be real as well?

Best

Kevin