How to implement a custom loss function which include frobenius norm?

Yuchen_Mu · November 29, 2020, 8:55pm

Hi,

I am using Pytorch to try to build and train a multilayer perceptron which is aimed at mapping a 450 dimension input to a 120 dimension output. Actually the data I have is in complex matrix, so I vectorize it and concatenate its real&imaginary part together (and this is why my input and output are of this high dimension). I am now hoping to use a customized loss function which includes the matrix frobenius norm between the predicted results and the target.

I searched a lot on the forum but still not sure about how to implement this, since I am not sure whether pytorch will do the auto-gradient for it and whether I should also do something with the back-propagation of the customized loss function etc. At the same time, when dealing with this kind of complex matrix data, is there any better way to handle it? Or apart from MSE, is there any good loss fuction to deal with this kind of regression problem?

Thanks in advance!

Yuchen_Mu · November 30, 2020, 5:30pm

Should I figure out how the derivative will affect each element and implement the backpropagation by hand?

KFrank · December 1, 2020, 2:37am

Hello Yuchen!

Complex tensors are still a work in progress in pytorch, with more
functionality (and more functionality that is actually correct) being
added in successive versions.

Note that as of version 1.6.0, I don’t believe that the pytorch optimizers
accept complex Parameters, so to use pytorch’s complex machinery,
you will have to either use real Parameters that you combine into
complex tensors or write your own complex-aware optimizer.

All in all, depending on what you are doing, it might be safest to
represent the real and imaginary parts of your complex tensors as
separate real tensors and carry out the complex arithmetic “by hand.”

The Frobenius norm of a (complex) matrix is simply the square root
of the sum of the squares of the (absolute values of the) individual
matrix elements. Pythorch’s tensor operations can do this* reasonably
straightforwardly.

*) With the proviso that complex tensors are a work in progress.

Note that as of version 1.6.0, torch.norm() is incorrect for complex
tensors – it uses the squares, rather than the squared absolute values,
of the matrix elements.

Here is a script that illustrates calculating and backpropagating the
Frobenius norm:

import torch
torch.__version__

_ = torch.random.manual_seed (2020)

x = torch.randn ([2, 3])
print ('x = ...\n', x)
print ('torch.norm (x) =', torch.norm (x))   # okay
z = torch.randn ([2, 3], dtype = torch.cfloat)
print ('z = ...\n', z)
print ('torch.norm (z) =', torch.norm (z))   # oops, should be positive real
z.requires_grad = True
znorm = torch.sqrt ((z * z.conj()).sum())
print ('znorm =', znorm)
znorm.backward()
print ('z.grad =', z.grad)
z.grad.zero_()
znormb = torch.sqrt ((torch.real (z)**2).sum() + (torch.imag (z)**2).sum())
print ('znormb =', znormb)
znormb.backward()
print ('z.grad =', z.grad)

And here is its (version 1.6.0) output:

x = ...
 tensor([[ 1.2372, -0.9604,  1.5415],
        [-0.4079,  0.8806,  0.0529]])
torch.norm (x) = tensor(2.4029)
z = ...
 tensor([[ 0.0531+0.3378j, -0.4779-1.5195j, -0.8105-0.1923j],
        [ 0.7118-0.0294j, -0.9088-0.3499j, -0.9167-0.8840j]])
torch.norm (z) = tensor(1.3644+1.4714j)
znorm = tensor(2.5350+0.j, grad_fn=<SqrtBackward>)
z.grad = tensor([[ 0.0210-0.1332j, -0.1885+0.5994j, -0.3197+0.0759j],
        [ 0.2808+0.0116j, -0.3585+0.1380j, -0.3616+0.3487j]])
znormb = tensor(2.5350, grad_fn=<SqrtBackward>)
z.grad = tensor([[ 0.0210-0.1332j, -0.1885+0.5994j, -0.3197+0.0759j],
        [ 0.2808+0.0116j, -0.3585+0.1380j, -0.3616+0.3487j]])

Be careful, however, with what you do with a complex gradient. You
will have to take the complex conjugate of the gradient to use it with
gradient-descent optimization.

This script illustrates this behavior by minimizing the Frobenius norm
with gradient descent:

import torch
torch.__version__

_ = torch.random.manual_seed (2020)

za = torch.randn ([2, 3], dtype = torch.cfloat)
zb = za.clone()

lr = 0.001

# gradient descent A
za.requires_grad = True
print ('za =', za)
for  i in range (10001):
    if  not za.grad == None:  _ = za.grad.zero_()
    znorm = torch.sqrt ((za * za.conj()).sum())
    znorm.backward()
    with torch.no_grad():
        _ = za.copy_ (za - lr * za.grad)   # doesn't converge
    if  i % 1000 == 0:  print ('znorm =', znorm)

print ('za =', za)

# gradient descent B
zb.requires_grad = True
print ('zb =', zb)
for  i in range (10001):
    if  not zb.grad == None:  _ = zb.grad.zero_()
    znorm = torch.sqrt ((zb * zb.conj()).sum())
    znorm.backward()
    with torch.no_grad():
        _ = zb.copy_ (zb - lr * zb.grad.conj())   # use conjugate of gradient to get convergence
    if  i % 1000 == 0:  print ('znorm =', znorm)

print ('zb =', zb)

And here is its output:

za = tensor([[ 0.8749-0.6791j,  1.0900-0.2884j,  0.6227+0.0374j],
        [ 0.0531+0.3378j, -0.4779-1.5195j, -0.8105-0.1923j]],
       requires_grad=True)
znorm = tensor(2.4970+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(2.8249+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(3.6008+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(4.5226+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(5.4901+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(6.4745+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(7.4661+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(8.4612+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(9.4581+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(10.4563+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(11.4551+0.j, grad_fn=<SqrtBackward>)
za = tensor([[ 0.1324-4.4859j,  0.1649-1.9052j,  0.0942+0.2472j],
        [ 0.0080+2.2312j, -0.0723-10.0381j, -0.1226-1.2704j]],
       requires_grad=True)
zb = tensor([[ 0.8749-0.6791j,  1.0900-0.2884j,  0.6227+0.0374j],
        [ 0.0531+0.3378j, -0.4779-1.5195j, -0.8105-0.1923j]],
       requires_grad=True)
znorm = tensor(2.4970+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(1.4970+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.4970+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
znorm = tensor(0.0010+0.j, grad_fn=<SqrtBackward>)
zb = tensor([[ 7.4477e-06-5.7811e-06j,  9.2789e-06-2.4552e-06j,
          5.3007e-06+3.1863e-07j],
        [ 4.5219e-07+2.8753e-06j, -4.0686e-06-1.2936e-05j,
         -6.9001e-06-1.6372e-06j]], requires_grad=True)

You can see that without taking its complex conjugate, the gradient
pushes the imaginary parts of the tensor away from zero so that the
Frobenius norm grows.

As long as you write your custom loss function using pytorch tensor
operations, you will get autograd and backpropagation (but not
complex optimization) “for free.” You won’t have to write an explicit
.backward() function for your loss function.

This will certainly be true if your represent the real and imaginary parts
of your complex tensors as explicit real tensors.

However, I think its worth trying pytorch’s complex tensors, but if you
decide to go this route, you should use the latest version of pytorch
that otherwise works for you, and test your complex manipulations
carefully, especially the various functions you use and backpropagation.

Good luck.

K. Frank

Yuchen_Mu · December 1, 2020, 12:55pm

Hi Frank,

Thank you very much for taking time to write this detailed reply!

Yes, I split the real and imaginary part of the original complex matrix then vectorize it. At the output side, I build a real matrix (which is the real part of the complex matrix) and a imaginart matrix (which is the imaginary part.) .

Specifically, I try to calculate the square of frobenius norm between the conjugate transpose of output and the target, here is what I did:

        # First matrix V_1_Hermitian(2*5)
        real_part_1 = output_from_network[i,0:5].float()
        real_part_2 = output_from_network[i,5:10].float()
        img_part_1 = -output_from_network[i,10:15].float()
        img_part_2 = -output_from_network[i,15:20].float()
        # A
        real_part_final = torch.stack((real_part_1,real_part_2),0)
        # B
        img_part_final = torch.stack((img_part_1,img_part_2),0)

        # Second matrix V_2(5*2)
        real_part_1_ = torch.reshape(target[i,0:5].float(), [5,1])
        real_part_2_ = torch.reshape(target[i,5:10].float(), [5,1])
        img_part_1_ = torch.reshape(target[i,10:15].float(), [5,1])
        img_part_2_ = torch.reshape(target[i,15:20].float(), [5,1])
        # C
        real_part_final_ = torch.cat((real_part_1_,real_part_2_),1)
        # D
        img_part_final_ = torch.cat((img_part_1_,img_part_2_),1)

        # M = V_1_Hemitian * V_2
        # M = Re(M) + Im(M)*j
        # Re(M) = A*C-B*D, Im(M) = A*D+B*C
        # frobenius norm square of M = frobenius norm square of Re(M) + frobenius norm square of Im(M)
        Re_M = torch.matmul(real_part_final, real_part_final_) - torch.matmul(img_part_final, img_part_final_)
        Im_M = torch.matmul(real_part_final, img_part_final_) + torch.matmul(img_part_final, real_part_final_)
        Re_M_norm_square = torch.pow(torch.norm(Re_M,'fro'),2)
        Im_M_norm_square = torch.pow(torch.norm(Im_M,'fro'),2)
        Final_M_norm_square = Re_M_norm_square + Im_M_norm_square

Sorry about the mess. the output and the target is a vector of size 20 (corresponding to a 5*2 matix), where first 10 elements are the real part and remaning 10 elements are the imaginary part. Since before I am not sure whether Pytorch can do the auto-gradient ‘for free’, I. to some extent, write the process of calculating this frobenius norm ‘by hand’ (as shown in the code, really messy ), will this code work?

Actually I verified that the final output, which is Final_M_norm_square, is the value that I want (so the manipulation I did give the correct result as I want it to be - The frobenius norm square between two complex matrix), my concern is whether the gradients will be calculated precisely during the backpropagation, since instead of building the complex matrix from the real part and imaginary part and calculate the frobenius norm as you shown me in your script example, I just directly calculate it from the real part and imaginary part, respectively.

Again, thank you very much for taking time and look at my question

KFrank · December 1, 2020, 6:57pm

Hello Yuchen!

I haven’t looked at your code in detail.

But from what you say, my understanding is that:

You are not using any complex tensors, so you don’t have
to worry about any pytorch complex issues.
Your calculation only uses pytorch tensor operations (i.e.,
you don’t switch over to something like numpy for part of
the calculation).
Your function gives you the correct “forward-pass” result.

So, yes, your code should work, it should work “for free” with
autograd, and backpropagation should work properly.

Best.

K. Frank

Yuchen_Mu · December 1, 2020, 7:18pm

Hi Frank,

Thank you very much! Really appreciate your help!

Regards,
Yuchen