Select Loss Functions for MultiLabel Task

Here, a multilabel task such as Text Classification. One text can be labeled with several labels just like the following example:
text_a = […001000100001110111…], here are N labels, so the dimension of text_a is N

My idea is to put the text into RNN, then map the output to a vector distribution with the same dimension as the count of labels, i.e. N.

There are three candidate loss function I am thinking of:

  1. KLDivLoss
    The true label and the predicted can be seen as two distribution. The model is trying to make them more similar.
  2. MultiLabelSoftMarginLoss
    Is this selection right? How to use this loss function if it is suitable for the aforementioned task?
  3. Bayesian Personal Ranking
    Is there readymade function for BPR in PyTorch?

What is the difference of using these loss function for such task? Is there any selection else?