I have an output vector and a ground truth vector. I want to minimize the cosine distance between them but i have a constraint - i want that the output vector will have a l2-norm of 1 so i have created the following custom loss:

Outputs is a batch of size 32 that each is a vector of 60 dims.
So i am trying to minimize the cosine distance + |1-norm(output)|
But when i add the L1Loss the network does not learn anything and without the L1Loss i get accuracy of 98%.
So i guess that i am not using the L1 loss appropriately.
Can you please advise how can i solve it?

Try looking at the values that the different losses take. I did not try to reproduce your code or anything, but when I use multiple losses like this, i usually do it like:

loss1 + alpha * loss2 + beta * loss3 + …

alpha, beta are set in the init and should be optimized as hyperparameters. Hope that helps.

It looks like you’re mixing up L1 and L2 losses and norms both in your description and code (although it’s a bit hard to tell since the variable names are incosistent).
If you want your vectors to be L2-normalized, the corresponding loss is MSELoss.
If you want your vectors to be L1-normalized then the p in output1.norm should be 1.

The main idea is that i want to minimize the cosine distance and also have a l2-norm of 1.
So there are 2 stages:
output is a batch of vectors, so at first i calculate the l2-norm of each vector:

output1_normed = output1.norm(p=2, dim=1)

and then i want to put a constraint that each norm should be 1 so i use: