Here, a multilabel task such as Text Classification. One text can be labeled with several labels just like the following example:
text_a = […001000100001110111…], here are N labels, so the dimension of text_a is N
My idea is to put the text into RNN, then map the output to a vector distribution with the same dimension as the count of labels, i.e. N.
There are three candidate loss function I am thinking of:
The true label and the predicted can be seen as two distribution. The model is trying to make them more similar.
Is this selection right? How to use this loss function if it is suitable for the aforementioned task?
- Bayesian Personal Ranking
Is there readymade function for BPR in PyTorch?
What is the difference of using these loss function for such task? Is there any selection else?