# How to implement a custom loss function which include frobenius norm?

Hi,

I am using Pytorch to try to build and train a multilayer perceptron which is aimed at mapping a 450 dimension input to a 120 dimension output. Actually the data I have is in complex matrix, so I vectorize it and concatenate its real&imaginary part together (and this is why my input and output are of this high dimension). I am now hoping to use a customized loss function which includes the matrix frobenius norm between the predicted results and the target.

I searched a lot on the forum but still not sure about how to implement this, since I am not sure whether pytorch will do the auto-gradient for it and whether I should also do something with the back-propagation of the customized loss function etc. At the same time, when dealing with this kind of complex matrix data, is there any better way to handle it? Or apart from MSE, is there any good loss fuction to deal with this kind of regression problem?

Should I figure out how the derivative will affect each element and implement the backpropagation by hand?

Hello Yuchen!

Complex tensors are still a work in progress in pytorch, with more
functionality (and more functionality that is actually correct) being

Note that as of version 1.6.0, I don’t believe that the pytorch optimizers
accept complex `Parameter`s, so to use pytorch’s complex machinery,
you will have to either use real `Parameter`s that you combine into
complex tensors or write your own complex-aware optimizer.

All in all, depending on what you are doing, it might be safest to
represent the real and imaginary parts of your complex tensors as
separate real tensors and carry out the complex arithmetic “by hand.”

The Frobenius norm of a (complex) matrix is simply the square root
of the sum of the squares of the (absolute values of the) individual
matrix elements. Pythorch’s tensor operations can do this* reasonably
straightforwardly.

*) With the proviso that complex tensors are a work in progress.

Note that as of version 1.6.0, `torch.norm()` is incorrect for complex
tensors – it uses the squares, rather than the squared absolute values,
of the matrix elements.

Here is a script that illustrates calculating and backpropagating the
Frobenius norm:

``````import torch
torch.__version__

_ = torch.random.manual_seed (2020)

x = torch.randn ([2, 3])
print ('x = ...\n', x)
print ('torch.norm (x) =', torch.norm (x))   # okay
z = torch.randn ([2, 3], dtype = torch.cfloat)
print ('z = ...\n', z)
print ('torch.norm (z) =', torch.norm (z))   # oops, should be positive real
znorm = torch.sqrt ((z * z.conj()).sum())
print ('znorm =', znorm)
znorm.backward()
znormb = torch.sqrt ((torch.real (z)**2).sum() + (torch.imag (z)**2).sum())
print ('znormb =', znormb)
znormb.backward()
``````

And here is its (version 1.6.0) output:

``````x = ...
tensor([[ 1.2372, -0.9604,  1.5415],
[-0.4079,  0.8806,  0.0529]])
torch.norm (x) = tensor(2.4029)
z = ...
tensor([[ 0.0531+0.3378j, -0.4779-1.5195j, -0.8105-0.1923j],
[ 0.7118-0.0294j, -0.9088-0.3499j, -0.9167-0.8840j]])
torch.norm (z) = tensor(1.3644+1.4714j)
z.grad = tensor([[ 0.0210-0.1332j, -0.1885+0.5994j, -0.3197+0.0759j],
[ 0.2808+0.0116j, -0.3585+0.1380j, -0.3616+0.3487j]])
z.grad = tensor([[ 0.0210-0.1332j, -0.1885+0.5994j, -0.3197+0.0759j],
[ 0.2808+0.0116j, -0.3585+0.1380j, -0.3616+0.3487j]])
``````

Be careful, however, with what you do with a complex gradient. You
will have to take the complex conjugate of the gradient to use it with

This script illustrates this behavior by minimizing the Frobenius norm

``````import torch
torch.__version__

_ = torch.random.manual_seed (2020)

za = torch.randn ([2, 3], dtype = torch.cfloat)
zb = za.clone()

lr = 0.001

print ('za =', za)
for  i in range (10001):
znorm = torch.sqrt ((za * za.conj()).sum())
znorm.backward()
_ = za.copy_ (za - lr * za.grad)   # doesn't converge
if  i % 1000 == 0:  print ('znorm =', znorm)

print ('za =', za)

print ('zb =', zb)
for  i in range (10001):
znorm = torch.sqrt ((zb * zb.conj()).sum())
znorm.backward()
_ = zb.copy_ (zb - lr * zb.grad.conj())   # use conjugate of gradient to get convergence
if  i % 1000 == 0:  print ('znorm =', znorm)

print ('zb =', zb)
``````

And here is its output:

``````za = tensor([[ 0.8749-0.6791j,  1.0900-0.2884j,  0.6227+0.0374j],
[ 0.0531+0.3378j, -0.4779-1.5195j, -0.8105-0.1923j]],
za = tensor([[ 0.1324-4.4859j,  0.1649-1.9052j,  0.0942+0.2472j],
[ 0.0080+2.2312j, -0.0723-10.0381j, -0.1226-1.2704j]],
zb = tensor([[ 0.8749-0.6791j,  1.0900-0.2884j,  0.6227+0.0374j],
[ 0.0531+0.3378j, -0.4779-1.5195j, -0.8105-0.1923j]],
zb = tensor([[ 7.4477e-06-5.7811e-06j,  9.2789e-06-2.4552e-06j,
5.3007e-06+3.1863e-07j],
[ 4.5219e-07+2.8753e-06j, -4.0686e-06-1.2936e-05j,
``````

You can see that without taking its complex conjugate, the gradient
pushes the imaginary parts of the tensor away from zero so that the
Frobenius norm grows.

As long as you write your custom loss function using pytorch tensor
operations, you will get autograd and backpropagation (but not
`.backward()` function for your loss function.

This will certainly be true if your represent the real and imaginary parts
of your complex tensors as explicit real tensors.

However, I think its worth trying pytorch’s complex tensors, but if you
decide to go this route, you should use the latest version of pytorch
that otherwise works for you, and test your complex manipulations
carefully, especially the various functions you use and backpropagation.

Good luck.

K. Frank

Hi Frank,

Thank you very much for taking time to write this detailed reply!

Yes, I split the real and imaginary part of the original complex matrix then vectorize it. At the output side, I build a real matrix (which is the real part of the complex matrix) and a imaginart matrix (which is the imaginary part.) .

Specifically, I try to calculate the square of frobenius norm between the conjugate transpose of output and the target, here is what I did:

``````        # First matrix V_1_Hermitian(2*5)
real_part_1 = output_from_network[i,0:5].float()
real_part_2 = output_from_network[i,5:10].float()
img_part_1 = -output_from_network[i,10:15].float()
img_part_2 = -output_from_network[i,15:20].float()
# A
real_part_final = torch.stack((real_part_1,real_part_2),0)
# B
img_part_final = torch.stack((img_part_1,img_part_2),0)

# Second matrix V_2(5*2)
real_part_1_ = torch.reshape(target[i,0:5].float(), [5,1])
real_part_2_ = torch.reshape(target[i,5:10].float(), [5,1])
img_part_1_ = torch.reshape(target[i,10:15].float(), [5,1])
img_part_2_ = torch.reshape(target[i,15:20].float(), [5,1])
# C
real_part_final_ = torch.cat((real_part_1_,real_part_2_),1)
# D
img_part_final_ = torch.cat((img_part_1_,img_part_2_),1)

# M = V_1_Hemitian * V_2
# M = Re(M) + Im(M)*j
# Re(M) = A*C-B*D, Im(M) = A*D+B*C
# frobenius norm square of M = frobenius norm square of Re(M) + frobenius norm square of Im(M)
Re_M = torch.matmul(real_part_final, real_part_final_) - torch.matmul(img_part_final, img_part_final_)
Im_M = torch.matmul(real_part_final, img_part_final_) + torch.matmul(img_part_final, real_part_final_)
Re_M_norm_square = torch.pow(torch.norm(Re_M,'fro'),2)
Im_M_norm_square = torch.pow(torch.norm(Im_M,'fro'),2)
Final_M_norm_square = Re_M_norm_square + Im_M_norm_square
``````

Sorry about the mess. the output and the target is a vector of size 20 (corresponding to a 5*2 matix), where first 10 elements are the real part and remaning 10 elements are the imaginary part. Since before I am not sure whether Pytorch can do the auto-gradient ‘for free’, I. to some extent, write the process of calculating this frobenius norm ‘by hand’ (as shown in the code, really messy ), will this code work?

Actually I verified that the final output, which is `Final_M_norm_square`, is the value that I want (so the manipulation I did give the correct result as I want it to be - The frobenius norm square between two complex matrix), my concern is whether the gradients will be calculated precisely during the backpropagation, since instead of building the complex matrix from the real part and imaginary part and calculate the frobenius norm as you shown me in your script example, I just directly calculate it from the real part and imaginary part, respectively.

Again, thank you very much for taking time and look at my question Hello Yuchen!

I haven’t looked at your code in detail.

But from what you say, my understanding is that:

1. You are not using any complex tensors, so you don’t have
to worry about any pytorch complex issues.

2. Your calculation only uses pytorch tensor operations (i.e.,
you don’t switch over to something like numpy for part of
the calculation).

3. Your function gives you the correct “forward-pass” result.

Thank you very much! Really appreciate your help! 