Here, a multilabel task such as Text Classification. One text can be labeled with several labels just like the following example:

text_a = […001000100001110111…], here are N labels, so the dimension of text_a is N

My idea is to put the text into RNN, then map the output to a vector distribution with the same dimension as the count of labels, i.e. N.

There are three candidate loss function I am thinking of:

- KLDivLoss

The true label and the predicted can be seen as two distribution. The model is trying to make them more similar. - MultiLabelSoftMarginLoss

Is this selection right? How to use this loss function if it is suitable for the aforementioned task? - Bayesian Personal Ranking

Is there readymade function for BPR in PyTorch?

What is the difference of using these loss function for such task? Is there any selection else?

Thanks~~~