I have a probability vector and I want to make the output of my network close to it (its dimension is high). The output of my network is the dot product of two input vector and then a Softmax is applied to make sure that the output is a probability vector. Now I don’t know what is the best loss function to compare these two. I tried BCE loss, but since the dimension is high, almost all element are below 0.5, so I think the BCE loss tries to make all elements close to zero and nothing else is important. I also tried MSE and L1 losses, but they don’t consider being the probability vector. Is there any loss function for this case?

thanks

You could try the Kullback-Leibler divergence Loss. Here is the PyTorch documentation for this loss function. I’m assuming you are not dealing with a classification problem (i.e. the target or ground-truth probabilities are not one-hot encoded), otherwise, you could use a Cross Entropy loss.

1 Like

Thanks @LeviViana, No it’s not one-hot encoded, they’re exactly two probability vectors. I didn’t think about KL. Thanks a lot for your help.

1 Like